Lorrie A. Shepard, PhD is University Distinguished Professor at the University of Colorado Boulder. Her research focuses on psychometrics and the use and misuse of tests in educational settings. Her technical work has contributed to validity theory, standard setting, and statistical models for detecting test bias. Her research studies on test use have addressed the identification of learning disabilities, readiness screening for kindergarten, grade retention, teacher testing, effects of high-stakes accountability testing, and most recently the use of classroom assessment to support teaching and learning.
Early presidents of the American Educational Research Association were leaders in the testing mov... more Early presidents of the American Educational Research Association were leaders in the testing movement. Their intentions were to improve education by means of testing, which included both IQ and achievement tests. Early measurement experts acknowledged in scholarly articles that IQ tests could not measure inherited ability of groups with vastly different opportunities to learn, and yet ability testing was promoted as a beneficial means for matching instruction to individual differences until the insights of the civil rights era in the 1960s. Standard achievement measures were developed importantly to allow valid comparisons across school systems and over time, but the representations of learning that were adequate 100 years ago came to have distorting effects on teaching and learning. Today’s young psychometricians have opportunities to create new assessments in partnership with curriculum experts, but they should remain alert to the ways that well-intentioned assessment systems have been corrupted in the past.
Background/ContextThe evolution of validity understandings from mid-century to now has emphasized... more Background/ContextThe evolution of validity understandings from mid-century to now has emphasized that test validity depends on test purpose—adding consequence considerations to issues of interpretation and evidentiary warrants.PurposeTo consider the tensions created by multiple purposes for assessment and sketch briefly how we got to where we are; furthermore, to address two critically important purposes: the accountability purpose versus the learning purpose for assessment.Research DesignThis is an analytic, closing commentary to this special section.ConclusionsWhen a test is used as an educational reform, the theory of action behind the reform should be made explicit and that theory or series of claims and assumptions is what should be examined in the validity evaluation. As to the prospect of improving the teaching profession by the use of value-added methods, I believe that this is an overly ambitious use of a potentially useful statistical tool. As these systems are being implemented, we can and should conduct validity studies designed to detect plausible shortcomings and side effects as well as intended outcomes.
This book is highly recommended for all involved in test development or who have any interest in ... more This book is highly recommended for all involved in test development or who have any interest in the use of tests in education or in other fields. The necessary mathematics are presented clearly but never obscure the important messages in the book. It will certainly be referred to constantly in my future work in this area. --Educational Research The fundamental goal of the Measurement Methods for the Social Sciences series is to make complex measurement concepts, topics, and methods available to readers with limited mathematical background but a strong desire to understand, as well as use, methods that are on the forefront of social science assessment. With this book on item bias detection methods, Gregory Camilli and Lorrie Shepard have achieved this goal admirably. --from the Foreword by Richard M. Jaeger What can item bias methods do--and not do--when applied to real test data? Aimed at helping researchers understand how item bias methods work, this book provides practical advice and specific details on the most useful methods for particular testing situations. Beginning with a review of early bias methods and the fairness issues associated with the topic of test bias, the authors explain the logic of each method in terms of how differential item functioning (DIF) is defined by the method--and how well the method can be expected to work in various situations. In addition, chapters include a summary of findings regarding the behavior of the various indexes in empirical studies, especially their reliability, correlation with known bias criteria, and correlations with other bias methods. The book concludes with a set of principles for deciding when DIF should be interpreted as evidence of bias
... In classrooms, formative assessment can readily be done in the context of mathematics problem... more ... In classrooms, formative assessment can readily be done in the context of mathematics problems, history papers, and science experiments, focusing on the key concepts and competencies that are the aims of a given instructional unit. Interim tests could similarly ...
Educational Measurement: Issues and Practice, 2018
To support equitable and ambitious teaching practices, classroom assessment design must be ground... more To support equitable and ambitious teaching practices, classroom assessment design must be grounded in a research‐based theory of learning. Compared to other theories, sociocultural theory offers a more powerful, integrative account of how motivational aspects of learning—such as self‐regulation, self‐efficacy, sense of belonging, and identity—are completely entwined with cognitive development. Instead of centering assessment within systems that support use of interim and end‐of‐year standardized tests, we argue for a vision of formative assessment based on discipline‐specific tasks and questions that can provide qualitative insights about student experience and thinking, including their identification with disciplinary practices. At the same time, to be consistent with a productive formative assessment culture, grading policies should avoid using points and grades “to motivate” students but should create opportunities for students to use feedback to improve their work. We argue for...
Early presidents of the American Educational Research Association were leaders in the testing mov... more Early presidents of the American Educational Research Association were leaders in the testing movement. Their intentions were to improve education by means of testing, which included both IQ and achievement tests. Early measurement experts acknowledged in scholarly articles that IQ tests could not measure inherited ability of groups with vastly different opportunities to learn, and yet ability testing was promoted as a beneficial means for matching instruction to individual differences until the insights of the civil rights era in the 1960s. Standard achievement measures were developed importantly to allow valid comparisons across school systems and over time, but the representations of learning that were adequate 100 years ago came to have distorting effects on teaching and learning. Today’s young psychometricians have opportunities to create new assessments in partnership with curriculum experts, but they should remain alert to the ways that well-intentioned assessment systems have been corrupted in the past.
Background/ContextThe evolution of validity understandings from mid-century to now has emphasized... more Background/ContextThe evolution of validity understandings from mid-century to now has emphasized that test validity depends on test purpose—adding consequence considerations to issues of interpretation and evidentiary warrants.PurposeTo consider the tensions created by multiple purposes for assessment and sketch briefly how we got to where we are; furthermore, to address two critically important purposes: the accountability purpose versus the learning purpose for assessment.Research DesignThis is an analytic, closing commentary to this special section.ConclusionsWhen a test is used as an educational reform, the theory of action behind the reform should be made explicit and that theory or series of claims and assumptions is what should be examined in the validity evaluation. As to the prospect of improving the teaching profession by the use of value-added methods, I believe that this is an overly ambitious use of a potentially useful statistical tool. As these systems are being implemented, we can and should conduct validity studies designed to detect plausible shortcomings and side effects as well as intended outcomes.
This book is highly recommended for all involved in test development or who have any interest in ... more This book is highly recommended for all involved in test development or who have any interest in the use of tests in education or in other fields. The necessary mathematics are presented clearly but never obscure the important messages in the book. It will certainly be referred to constantly in my future work in this area. --Educational Research The fundamental goal of the Measurement Methods for the Social Sciences series is to make complex measurement concepts, topics, and methods available to readers with limited mathematical background but a strong desire to understand, as well as use, methods that are on the forefront of social science assessment. With this book on item bias detection methods, Gregory Camilli and Lorrie Shepard have achieved this goal admirably. --from the Foreword by Richard M. Jaeger What can item bias methods do--and not do--when applied to real test data? Aimed at helping researchers understand how item bias methods work, this book provides practical advice and specific details on the most useful methods for particular testing situations. Beginning with a review of early bias methods and the fairness issues associated with the topic of test bias, the authors explain the logic of each method in terms of how differential item functioning (DIF) is defined by the method--and how well the method can be expected to work in various situations. In addition, chapters include a summary of findings regarding the behavior of the various indexes in empirical studies, especially their reliability, correlation with known bias criteria, and correlations with other bias methods. The book concludes with a set of principles for deciding when DIF should be interpreted as evidence of bias
... In classrooms, formative assessment can readily be done in the context of mathematics problem... more ... In classrooms, formative assessment can readily be done in the context of mathematics problems, history papers, and science experiments, focusing on the key concepts and competencies that are the aims of a given instructional unit. Interim tests could similarly ...
Educational Measurement: Issues and Practice, 2018
To support equitable and ambitious teaching practices, classroom assessment design must be ground... more To support equitable and ambitious teaching practices, classroom assessment design must be grounded in a research‐based theory of learning. Compared to other theories, sociocultural theory offers a more powerful, integrative account of how motivational aspects of learning—such as self‐regulation, self‐efficacy, sense of belonging, and identity—are completely entwined with cognitive development. Instead of centering assessment within systems that support use of interim and end‐of‐year standardized tests, we argue for a vision of formative assessment based on discipline‐specific tasks and questions that can provide qualitative insights about student experience and thinking, including their identification with disciplinary practices. At the same time, to be consistent with a productive formative assessment culture, grading policies should avoid using points and grades “to motivate” students but should create opportunities for students to use feedback to improve their work. We argue for...
Uploads
Papers by Lorrie Shepard