International Journal of Multiple Research Approaches, 2011
As the United States strives to meet the challenges of improving the academic achievement of Afri... more As the United States strives to meet the challenges of improving the academic achievement of African American students in large urban school districts, researchers are beginning to examine cultural awareness and beliefs of urban teachers. The present study used a parallel mixed methods design to examine the score-validity and score-reliability of a cultural awareness and beliefs inventory (CABI). This 46-item inventory measured the perceptions of 1,253 urban teachers. Specifi cally, the CABI measured urban teachers' cultural awareness and beliefs about their African American students. Construct validity was addressed by establishing internal consistency and content-related, structural, and substantive validities derived from analyses of two data strands. Implications of the study for policy makers, administrators, and educators, and directions for future research are provided.
indices of moral decision-making assessed by the Defining Issues Test have been limited to correl... more indices of moral decision-making assessed by the Defining Issues Test have been limited to correlational analyses. This study used Harm, Fairness, Ingroup, Authority and Purity to predict overall moral judgment and individual Defining Issues Test-2 (DIT-2) schema scores using responses from 222 undergraduates. Relationships were not confirmed between the separate foundations and the DIT-2 indices. Using the MFQ moral judgment items only, confirmatory factor analyses confirmed higher order constructs called Individualizing and Binding foundations. Structural models using these higher order factors fitted the data well, and findings indicated that the Binding foundations significantly positively predicted Maintaining Norms and negatively predicted both overall moral judgment (N2) and the Postconventional Schema. Neither Individualizing nor Binding foundations significantly predicted Personal Interest. While moral judgments assessed by DIT-2 may not be evoking the MFQ foundations, findings here suggest the MFQ may not be a suitable measure for capturing more advanced moral functioning.
ABSTRACT Planned missingness in commonly administered proportions of LibQUAL+® and Lite instrumen... more ABSTRACT Planned missingness in commonly administered proportions of LibQUAL+® and Lite instruments may lead to loss of information. Data from three previous administrations of LibQUAL+® protocol were used to simulate data representing five proportions of administration. Statistics of interest (i.e., means, adequacy and superiority gaps, standard deviations, and Pearson and polychoric correlations) and their confidence intervals (CIs) from simulated and real data were compared. All CIs for the statistics of interest for simulated data contained the original values. Root mean squared errors, and absolute and relative biases showed that accuracy in the estimates decreased with increase in Lite proportion. The recommendation is to administer the Lite version to not more than 20% of the respondents if the purpose of the data collection is to conduct any inferential analysis. If researchers are interested in calculating means alone, up to 80% Lite version may be used to capture the true values adequately. However, standard deviations need to be interpreted to understand the quality of the means. Loss of accuracy in estimates may be compounded in analyses that use at least two statistics of interest.
It is not uncommon to use unidimensional item response theory models to estimate ability in multi... more It is not uncommon to use unidimensional item response theory models to estimate ability in multidimensional data with computerized adaptive testing (CAT). The current Monte Carlo study investigated the penalty of this model misspecification in CAT implementations using different item selection methods and exposure control strategies. Three item selection methods-maximum information (MAXI), a-stratification (STRA), and a-stratification with b-blocking (STRB) with and without Sympson-Hetter (SH) exposure control strategy-were investigated. Calibrating multidimensional items as unidimensional items resulted in inaccurate item parameter estimates. Therefore, MAXI performed better than STRA and STRB in estimating the ability parameters. However, all three methods had relatively large standard errors. SH exposure control had no impact on the number of overexposed items. Existing unidimensional CAT implementations might consider using MAXI only if recalibration as multidimensional model is too expensive. Otherwise, building a CAT pool by calibrating multidimensional data as unidimensional is not recommended.
ABSTRACT LibQUAL+® is an instrument purported to measure three dimensions of library service qual... more ABSTRACT LibQUAL+® is an instrument purported to measure three dimensions of library service quality: service affect, library as a place, and information control. After changes were made to the instrument in 2003, however, no confirmatory factor analyses have been published in peer-reviewed journals affirming the three-factor structure of LibQUAL+®. These deficiencies were addressed by testing the hypothesized three-factor structure and the stability of that structure over time. Specifically, data from three samples (n = 550; n = 3261; n = 2103) were collected over a five-year period and analyzed using a multi-group confirmatory factor analysis. Results suggest that the theoretical model fit the data across the three samples and demonstrates factorial invariance over time. Multicollinearity between affect of service and information control, however, indicate that service quality may be measured as two dimensions rather than three, providing a more parsimonious explanation of service quality.
The LibQUAL+® instrument measures users' perceptions of library service quality; three factors ar... more The LibQUAL+® instrument measures users' perceptions of library service quality; three factors are evaluated: Affect of Service, Information Control, and Library as Place. Although previous studies have assessed the factorial invariance of LibQUAL+®, factorial invariance by itself is insufficient for score comparability across groups. Stronger levels of measurement invariance need to be established. This study systematically tested the measurement and structural invariance of LibQUAL+® scores in a sample of 1551 undergraduate students, 707 graduate students, and 134 faculty members. Multi-group confirmatory factor analyses showed that full measurement invariance did hold between students and faculty for the complete instrument. Building on the measurement invariance, structural invariance models showed that factor variances were equivalent across user groups, but factor covariances and means differed. Faculty had higher perceptions of Affect of Service and undergraduate students had higher perceptions of Library as Place compared to the other groups.
The mathematics teaching efficacy beliefs of preservice elementary teachers have been the subject... more The mathematics teaching efficacy beliefs of preservice elementary teachers have been the subject of several studies. A widely used measure in these studies is the Mathematics Teaching Efficacy Beliefs Instrument (MTEBI). The present study provides a detailed analysis of the psychometric properties of the MTEBI using Bayesian item response theory. We discuss local dependence between item pairs, psychometric quality of the items, validity of the scoring procedure, and measurement accuracy for teachers with different efficacy levels. Our findings suggest that in its present form, the test reliability of the MTEBI may not be as high as assumed to date. The scale, wording, and placement of the items need revision. Moreover, additional items need to be constructed to measure below average levels of efficacy more accurately. Ordering the items according to difficulty, we describe the structure of mathematics teaching efficacy beliefs and draw some implications for mathematics teacher educators.
The authors examined the distributional properties of 3 improvement-over-chance, I, effect sizes ... more The authors examined the distributional properties of 3 improvement-over-chance, I, effect sizes each derived from linear and quadratic predictive discriminant analysis and from logistic regression analysis for the 2-group univariate classification. These 3 classification methods (3 levels) were studied under varying levels of data conditions, including population separation (3 levels), variance pattern (3 levels), total sample size (3 levels), and prior probabilities (5 levels). The results indicated that the decision of which effect size to choose is primarily determined by the variance pattern and prior probabilities. Some of the I indices performed well for some small sample cases and quadratic predictive discriminant analysis I tended to work well with extreme variance heterogeneity and differing prior probabilities.
ABSTRACT The most recent development in the field of Item Response Theory (IRT) has been the eval... more ABSTRACT The most recent development in the field of Item Response Theory (IRT) has been the evaluation of IRT models as multilevel models, known as Multilevel IRT models (MLIRT). These models offer several statistical and practical advantages over ordinary IRT models. However, models such as 2-PL MLIRT models have not been studied yet. This dissertation consists of two studies, a simulation and a substantiation for an urban school district dataset. The simulation study tested the performance of twoparameter (2-PL) MLIRT models with predictor variables under various conditions that included 3 test lengths (15, 30, and 60 items), 4 sample sizes (200, 500, 1000, and 2000), 2 correlation conditions between the predictor variable and the ability (or attitude) parameter (rpb=.35 and .8), and 4 binomial distributions of the predictor variable (p=0.1, 0.25, 0.4, and 0.5). The bias and Root Mean Square Deviation (RMSD) values of the item parameters indicated that the distribution of the predictor variable and the correlation between the predictor and the ability (or attitude) parameter did not affect the estimates of 2-PL MLIRT models. These models performed well for sample sizes as low as 500 and test lengths as low as 15 which is lower than the required sample size for ordinary IRT models. Even for a sample size of 200, sufficiently accurate estimates were obtained with more than 300 iterations. The second study investigated the characteristics of the items that measured urban teachers’ perceptions of cultural awareness and beliefs about teaching African American children and tested whether these perceptions were influenced by the teachers’ gender, ethnicity, or teaching experience. Teacher beliefs about teaching African American students, culturally responsive management, and cultural awareness factors were influenced by the ethnicity of the teachers. Culturally responsive management, home and community support, and curriculum and instructional strategies factors were influenced by the teaching experience of the teachers. Items that were biased based on ethnicity or teaching experience were identified. None of the items exhibited gender bias. The study identified items that could be used over other items when the need for a shorter instrument or more informative categories arises.
International Journal of Electronic Commerce, 2010
This study examined channel-migration behavior using a decomposed Theory of Planned Behavior with... more This study examined channel-migration behavior using a decomposed Theory of Planned Behavior with crossover effects in brick-and-mortar stores and the Internet. An online survey was administered at four research sites (N= 547) and factor analysis and structural equation modeling, with multigroup analysis, were utilized for data analysis. Hedonic beliefs did not influence either of the channels, whereas, utilitarian beliefs were significant predictors in both brick-and-mortar stores and the Internet. Additionally, ...
The present study presents the formulation of graded response models in the multilevel framework ... more The present study presents the formulation of graded response models in the multilevel framework (as nonlinear mixed models) and demonstrates their use in estimating item parameters and investigating the group-level effects for specific covariates using Bayesian estimation. The graded response multilevel model (GRMM) combines the formulation of graded response models with the discrimination parameter fixed at one for all items by Tuerlinckx and Wang and of two parameter models by Rijmen and Briggs to offer graded response models with item-specific discrimination parameters. Apart from the contribution to the body of knowledge by formulating GRMMs, the significance of the present study includes providing a meeting point between psychometrics and statistics, overcoming the Neyman-Scott problem by using Bayesian estimation, estimation of abilities of persons with extreme scores, and demonstration of general purpose software for estimating item response theory parameters. Data from the emotional functioning scale on 11,158 healthy and chronically ill children and adolescents were used from the PedsQL 4.0 Generic Core Scales database to illustrate the model. Estimates for the item parameters from WINBUGS using Bayesian priors and Multilog were compared for the GRMM and the ordinary graded response models, respectively.
Markov chain Monte Carlo (MCMC) methods enable a fully Bayesian approach to parameter estimation ... more Markov chain Monte Carlo (MCMC) methods enable a fully Bayesian approach to parameter estimation of item response models. In this simulation study, the authors compared the recovery of graded response model parameters using marginal maximum likelihood (MML) and Gibbs sampling (MCMC) under various latent trait distributions, test lengths, and sample sizes. Sample size and test length explained the largest amount of variance in item and person parameter estimates, respectively. There was little difference in item parameter recovery between MML and MCMC in samples with 300 or more respondents. MCMC recovered some item threshold parameters better in samples with 75 or 150 respondents. Bias in threshold parameter estimates depended on the generating value and the type of threshold. Person parameters were comparable between MCMC and MML/expected a posteriori for all test lengths.
International Journal of Multiple Research Approaches, 2011
As the United States strives to meet the challenges of improving the academic achievement of Afri... more As the United States strives to meet the challenges of improving the academic achievement of African American students in large urban school districts, researchers are beginning to examine cultural awareness and beliefs of urban teachers. The present study used a parallel mixed methods design to examine the score-validity and score-reliability of a cultural awareness and beliefs inventory (CABI). This 46-item inventory measured the perceptions of 1,253 urban teachers. Specifi cally, the CABI measured urban teachers' cultural awareness and beliefs about their African American students. Construct validity was addressed by establishing internal consistency and content-related, structural, and substantive validities derived from analyses of two data strands. Implications of the study for policy makers, administrators, and educators, and directions for future research are provided.
indices of moral decision-making assessed by the Defining Issues Test have been limited to correl... more indices of moral decision-making assessed by the Defining Issues Test have been limited to correlational analyses. This study used Harm, Fairness, Ingroup, Authority and Purity to predict overall moral judgment and individual Defining Issues Test-2 (DIT-2) schema scores using responses from 222 undergraduates. Relationships were not confirmed between the separate foundations and the DIT-2 indices. Using the MFQ moral judgment items only, confirmatory factor analyses confirmed higher order constructs called Individualizing and Binding foundations. Structural models using these higher order factors fitted the data well, and findings indicated that the Binding foundations significantly positively predicted Maintaining Norms and negatively predicted both overall moral judgment (N2) and the Postconventional Schema. Neither Individualizing nor Binding foundations significantly predicted Personal Interest. While moral judgments assessed by DIT-2 may not be evoking the MFQ foundations, findings here suggest the MFQ may not be a suitable measure for capturing more advanced moral functioning.
ABSTRACT Planned missingness in commonly administered proportions of LibQUAL+® and Lite instrumen... more ABSTRACT Planned missingness in commonly administered proportions of LibQUAL+® and Lite instruments may lead to loss of information. Data from three previous administrations of LibQUAL+® protocol were used to simulate data representing five proportions of administration. Statistics of interest (i.e., means, adequacy and superiority gaps, standard deviations, and Pearson and polychoric correlations) and their confidence intervals (CIs) from simulated and real data were compared. All CIs for the statistics of interest for simulated data contained the original values. Root mean squared errors, and absolute and relative biases showed that accuracy in the estimates decreased with increase in Lite proportion. The recommendation is to administer the Lite version to not more than 20% of the respondents if the purpose of the data collection is to conduct any inferential analysis. If researchers are interested in calculating means alone, up to 80% Lite version may be used to capture the true values adequately. However, standard deviations need to be interpreted to understand the quality of the means. Loss of accuracy in estimates may be compounded in analyses that use at least two statistics of interest.
It is not uncommon to use unidimensional item response theory models to estimate ability in multi... more It is not uncommon to use unidimensional item response theory models to estimate ability in multidimensional data with computerized adaptive testing (CAT). The current Monte Carlo study investigated the penalty of this model misspecification in CAT implementations using different item selection methods and exposure control strategies. Three item selection methods-maximum information (MAXI), a-stratification (STRA), and a-stratification with b-blocking (STRB) with and without Sympson-Hetter (SH) exposure control strategy-were investigated. Calibrating multidimensional items as unidimensional items resulted in inaccurate item parameter estimates. Therefore, MAXI performed better than STRA and STRB in estimating the ability parameters. However, all three methods had relatively large standard errors. SH exposure control had no impact on the number of overexposed items. Existing unidimensional CAT implementations might consider using MAXI only if recalibration as multidimensional model is too expensive. Otherwise, building a CAT pool by calibrating multidimensional data as unidimensional is not recommended.
ABSTRACT LibQUAL+® is an instrument purported to measure three dimensions of library service qual... more ABSTRACT LibQUAL+® is an instrument purported to measure three dimensions of library service quality: service affect, library as a place, and information control. After changes were made to the instrument in 2003, however, no confirmatory factor analyses have been published in peer-reviewed journals affirming the three-factor structure of LibQUAL+®. These deficiencies were addressed by testing the hypothesized three-factor structure and the stability of that structure over time. Specifically, data from three samples (n = 550; n = 3261; n = 2103) were collected over a five-year period and analyzed using a multi-group confirmatory factor analysis. Results suggest that the theoretical model fit the data across the three samples and demonstrates factorial invariance over time. Multicollinearity between affect of service and information control, however, indicate that service quality may be measured as two dimensions rather than three, providing a more parsimonious explanation of service quality.
The LibQUAL+® instrument measures users' perceptions of library service quality; three factors ar... more The LibQUAL+® instrument measures users' perceptions of library service quality; three factors are evaluated: Affect of Service, Information Control, and Library as Place. Although previous studies have assessed the factorial invariance of LibQUAL+®, factorial invariance by itself is insufficient for score comparability across groups. Stronger levels of measurement invariance need to be established. This study systematically tested the measurement and structural invariance of LibQUAL+® scores in a sample of 1551 undergraduate students, 707 graduate students, and 134 faculty members. Multi-group confirmatory factor analyses showed that full measurement invariance did hold between students and faculty for the complete instrument. Building on the measurement invariance, structural invariance models showed that factor variances were equivalent across user groups, but factor covariances and means differed. Faculty had higher perceptions of Affect of Service and undergraduate students had higher perceptions of Library as Place compared to the other groups.
The mathematics teaching efficacy beliefs of preservice elementary teachers have been the subject... more The mathematics teaching efficacy beliefs of preservice elementary teachers have been the subject of several studies. A widely used measure in these studies is the Mathematics Teaching Efficacy Beliefs Instrument (MTEBI). The present study provides a detailed analysis of the psychometric properties of the MTEBI using Bayesian item response theory. We discuss local dependence between item pairs, psychometric quality of the items, validity of the scoring procedure, and measurement accuracy for teachers with different efficacy levels. Our findings suggest that in its present form, the test reliability of the MTEBI may not be as high as assumed to date. The scale, wording, and placement of the items need revision. Moreover, additional items need to be constructed to measure below average levels of efficacy more accurately. Ordering the items according to difficulty, we describe the structure of mathematics teaching efficacy beliefs and draw some implications for mathematics teacher educators.
The authors examined the distributional properties of 3 improvement-over-chance, I, effect sizes ... more The authors examined the distributional properties of 3 improvement-over-chance, I, effect sizes each derived from linear and quadratic predictive discriminant analysis and from logistic regression analysis for the 2-group univariate classification. These 3 classification methods (3 levels) were studied under varying levels of data conditions, including population separation (3 levels), variance pattern (3 levels), total sample size (3 levels), and prior probabilities (5 levels). The results indicated that the decision of which effect size to choose is primarily determined by the variance pattern and prior probabilities. Some of the I indices performed well for some small sample cases and quadratic predictive discriminant analysis I tended to work well with extreme variance heterogeneity and differing prior probabilities.
ABSTRACT The most recent development in the field of Item Response Theory (IRT) has been the eval... more ABSTRACT The most recent development in the field of Item Response Theory (IRT) has been the evaluation of IRT models as multilevel models, known as Multilevel IRT models (MLIRT). These models offer several statistical and practical advantages over ordinary IRT models. However, models such as 2-PL MLIRT models have not been studied yet. This dissertation consists of two studies, a simulation and a substantiation for an urban school district dataset. The simulation study tested the performance of twoparameter (2-PL) MLIRT models with predictor variables under various conditions that included 3 test lengths (15, 30, and 60 items), 4 sample sizes (200, 500, 1000, and 2000), 2 correlation conditions between the predictor variable and the ability (or attitude) parameter (rpb=.35 and .8), and 4 binomial distributions of the predictor variable (p=0.1, 0.25, 0.4, and 0.5). The bias and Root Mean Square Deviation (RMSD) values of the item parameters indicated that the distribution of the predictor variable and the correlation between the predictor and the ability (or attitude) parameter did not affect the estimates of 2-PL MLIRT models. These models performed well for sample sizes as low as 500 and test lengths as low as 15 which is lower than the required sample size for ordinary IRT models. Even for a sample size of 200, sufficiently accurate estimates were obtained with more than 300 iterations. The second study investigated the characteristics of the items that measured urban teachers’ perceptions of cultural awareness and beliefs about teaching African American children and tested whether these perceptions were influenced by the teachers’ gender, ethnicity, or teaching experience. Teacher beliefs about teaching African American students, culturally responsive management, and cultural awareness factors were influenced by the ethnicity of the teachers. Culturally responsive management, home and community support, and curriculum and instructional strategies factors were influenced by the teaching experience of the teachers. Items that were biased based on ethnicity or teaching experience were identified. None of the items exhibited gender bias. The study identified items that could be used over other items when the need for a shorter instrument or more informative categories arises.
International Journal of Electronic Commerce, 2010
This study examined channel-migration behavior using a decomposed Theory of Planned Behavior with... more This study examined channel-migration behavior using a decomposed Theory of Planned Behavior with crossover effects in brick-and-mortar stores and the Internet. An online survey was administered at four research sites (N= 547) and factor analysis and structural equation modeling, with multigroup analysis, were utilized for data analysis. Hedonic beliefs did not influence either of the channels, whereas, utilitarian beliefs were significant predictors in both brick-and-mortar stores and the Internet. Additionally, ...
The present study presents the formulation of graded response models in the multilevel framework ... more The present study presents the formulation of graded response models in the multilevel framework (as nonlinear mixed models) and demonstrates their use in estimating item parameters and investigating the group-level effects for specific covariates using Bayesian estimation. The graded response multilevel model (GRMM) combines the formulation of graded response models with the discrimination parameter fixed at one for all items by Tuerlinckx and Wang and of two parameter models by Rijmen and Briggs to offer graded response models with item-specific discrimination parameters. Apart from the contribution to the body of knowledge by formulating GRMMs, the significance of the present study includes providing a meeting point between psychometrics and statistics, overcoming the Neyman-Scott problem by using Bayesian estimation, estimation of abilities of persons with extreme scores, and demonstration of general purpose software for estimating item response theory parameters. Data from the emotional functioning scale on 11,158 healthy and chronically ill children and adolescents were used from the PedsQL 4.0 Generic Core Scales database to illustrate the model. Estimates for the item parameters from WINBUGS using Bayesian priors and Multilog were compared for the GRMM and the ordinary graded response models, respectively.
Markov chain Monte Carlo (MCMC) methods enable a fully Bayesian approach to parameter estimation ... more Markov chain Monte Carlo (MCMC) methods enable a fully Bayesian approach to parameter estimation of item response models. In this simulation study, the authors compared the recovery of graded response model parameters using marginal maximum likelihood (MML) and Gibbs sampling (MCMC) under various latent trait distributions, test lengths, and sample sizes. Sample size and test length explained the largest amount of variance in item and person parameter estimates, respectively. There was little difference in item parameter recovery between MML and MCMC in samples with 300 or more respondents. MCMC recovered some item threshold parameters better in samples with 75 or 150 respondents. Bias in threshold parameter estimates depended on the generating value and the type of threshold. Person parameters were comparable between MCMC and MML/expected a posteriori for all test lengths.
Uploads
Papers by Prathiba Natesan