Consensus scoring occurs when the scoring key for a test is based upon the responses of the norm ... more Consensus scoring occurs when the scoring key for a test is based upon the responses of the norm group. Consensus scoring is an attractive alternative to traditional methods of creating a scoring key for ability tests, especially useful when experts disagree about the correct answers to test items, as they do in the area of emotions and emotion perception. Of the many variations of consensus scoring, mode consensus scoring (the most frequent response in a norm group is given a score of 1, and all other responses a score of 0) and proportion consensus scoring (each respondent's score on an item is equal to the proportion of the norm group who match the respondent's answer) are the most widely used and the most psychometrically promising. This paper demonstrates that mode consensus scoring is biased against smaller sub-groups within the norm group: when sub-groups differ in their modal responses, the size of the sub-groups will influence the average group score. No known scoring option eliminates this bias. In contrast, proportion consensus scoring is not necessarily biased against smaller groups, although bias does occur in some extreme situations. Proportion consensus scoring is therefore the preferred consensus scoring option at this time.
Page 1. 1 Emotional Awareness: Computer and Hand Scoring of an Open-Ended Test Kimberly A. Barcha... more Page 1. 1 Emotional Awareness: Computer and Hand Scoring of an Open-Ended Test Kimberly A. Barchard a , Richard D. Lane b , and Bryan D. Watson a ... International Journal of Eating Disorders, 37, 321-329. doi:10.1002/eat.20132 Ciarrochi, J., Caputi, P., & Mayer, JD (2003). ...
ABSTRACT Researchers now know that when theoretical reliability increases, power can increase, de... more ABSTRACT Researchers now know that when theoretical reliability increases, power can increase, decrease, or stay the same. However, no analytic research has examined the relationship of power to the most commonly used type of reliability—internal consistency—and the most commonly used measures of internal consistency, coefficient alpha and ICC(A,k). We examine the relationship between the power of independent samples t tests and internal consistency. We explicate the mathematical model upon which researchers usually calculate internal consistency, one in which total scores are calculated as the sum of observed scores on K measures. Using this model, we derive a new formula for effect size to show that power and internal consistency are influenced by many of the same parameters but not always in the same direction. Changing an experiment in one way (e.g., lengthening the measure) is likely to influence multiple parameters simultaneously; thus, there are no simple relationships between such changes and internal consistency or power. If researchers revise measures to increase internal consistency, this might not increase power. To increase power, researchers should increase sample size, select measures that assess areas where group differences are largest, and use more powerful statistical procedures (e.g., ANCOVA).
ABSTRACT Data entry errors can have catastrophic effects on the results of a statistical analysis... more ABSTRACT Data entry errors can have catastrophic effects on the results of a statistical analysis. Therefore, researchers often spend considerable effort checking their data. This paper compared the effectiveness of three data checking methods – double entry, read aloud, and visual checking – using the types of data and data entry personnel that are typically used in psychological research. To compare these techniques, we created 20 data sheets and entered them into the computer. Next, we deliberately introduced errors into this data set. Participants’ job was to locate and correct these errors.A total of 340 undergraduates participated in this study. Of these, 80 had previous data entry experience and 260 did not. Double entry was far superior to read aloud and visual checking, both among people with previous data entry experience and among people without previous experience. Among people with no previous experience, read aloud and visual checking had more than 20 times as many errors as double entry. In addition, double entry was preferred over visual checking. Thus, although double entry takes slightly longer, it is clearly worth the extra effort.
This study compared the effectiveness of simulation-based instruction to traditional teacher-dire... more This study compared the effectiveness of simulation-based instruction to traditional teacher-directed instruction about water resource management in Las Vegas. Subjects, undergraduate students recruited from Psychology and Environmental Studies departments, participated in one of two treatments. All participants were given a pretest prior to instruction, a post-test immediately following instruction, and a retention-test 4 weeks after instruction. Evaluation instruments provided overall scores, gauged student learning in topic areas and different question difficulty-levels as well as attitudes toward the environment and water management. The treatments differed only in how students interacted with the system after receiving background information on Las Vegas valley's water issues. Students in the traditional group used a lecture format presentation of graphed results to show affects of changes to the system, while the students in the simulation-based group manipulated the inter...
Consensus scoring occurs when the scoring key for a test is based upon the responses of the norm ... more Consensus scoring occurs when the scoring key for a test is based upon the responses of the norm group. Consensus scoring is an attractive alternative to traditional methods of creating a scoring key for ability tests, especially useful when experts disagree about the correct answers to test items, as they do in the area of emotions and emotion perception. Of the many variations of consensus scoring, mode consensus scoring (the most frequent response in a norm group is given a score of 1, and all other responses a score of 0) and proportion consensus scoring (each respondent's score on an item is equal to the proportion of the norm group who match the respondent's answer) are the most widely used and the most psychometrically promising. This paper demonstrates that mode consensus scoring is biased against smaller sub-groups within the norm group: when sub-groups differ in their modal responses, the size of the sub-groups will influence the average group score. No known scoring option eliminates this bias. In contrast, proportion consensus scoring is not necessarily biased against smaller groups, although bias does occur in some extreme situations. Proportion consensus scoring is therefore the preferred consensus scoring option at this time.
Page 1. 1 Emotional Awareness: Computer and Hand Scoring of an Open-Ended Test Kimberly A. Barcha... more Page 1. 1 Emotional Awareness: Computer and Hand Scoring of an Open-Ended Test Kimberly A. Barchard a , Richard D. Lane b , and Bryan D. Watson a ... International Journal of Eating Disorders, 37, 321-329. doi:10.1002/eat.20132 Ciarrochi, J., Caputi, P., & Mayer, JD (2003). ...
ABSTRACT Researchers now know that when theoretical reliability increases, power can increase, de... more ABSTRACT Researchers now know that when theoretical reliability increases, power can increase, decrease, or stay the same. However, no analytic research has examined the relationship of power to the most commonly used type of reliability—internal consistency—and the most commonly used measures of internal consistency, coefficient alpha and ICC(A,k). We examine the relationship between the power of independent samples t tests and internal consistency. We explicate the mathematical model upon which researchers usually calculate internal consistency, one in which total scores are calculated as the sum of observed scores on K measures. Using this model, we derive a new formula for effect size to show that power and internal consistency are influenced by many of the same parameters but not always in the same direction. Changing an experiment in one way (e.g., lengthening the measure) is likely to influence multiple parameters simultaneously; thus, there are no simple relationships between such changes and internal consistency or power. If researchers revise measures to increase internal consistency, this might not increase power. To increase power, researchers should increase sample size, select measures that assess areas where group differences are largest, and use more powerful statistical procedures (e.g., ANCOVA).
ABSTRACT Data entry errors can have catastrophic effects on the results of a statistical analysis... more ABSTRACT Data entry errors can have catastrophic effects on the results of a statistical analysis. Therefore, researchers often spend considerable effort checking their data. This paper compared the effectiveness of three data checking methods – double entry, read aloud, and visual checking – using the types of data and data entry personnel that are typically used in psychological research. To compare these techniques, we created 20 data sheets and entered them into the computer. Next, we deliberately introduced errors into this data set. Participants’ job was to locate and correct these errors.A total of 340 undergraduates participated in this study. Of these, 80 had previous data entry experience and 260 did not. Double entry was far superior to read aloud and visual checking, both among people with previous data entry experience and among people without previous experience. Among people with no previous experience, read aloud and visual checking had more than 20 times as many errors as double entry. In addition, double entry was preferred over visual checking. Thus, although double entry takes slightly longer, it is clearly worth the extra effort.
This study compared the effectiveness of simulation-based instruction to traditional teacher-dire... more This study compared the effectiveness of simulation-based instruction to traditional teacher-directed instruction about water resource management in Las Vegas. Subjects, undergraduate students recruited from Psychology and Environmental Studies departments, participated in one of two treatments. All participants were given a pretest prior to instruction, a post-test immediately following instruction, and a retention-test 4 weeks after instruction. Evaluation instruments provided overall scores, gauged student learning in topic areas and different question difficulty-levels as well as attitudes toward the environment and water management. The treatments differed only in how students interacted with the system after receiving background information on Las Vegas valley's water issues. Students in the traditional group used a lecture format presentation of graphed results to show affects of changes to the system, while the students in the simulation-based group manipulated the inter...
Uploads
Papers by Kimberly Barchard