Courtney Bell, Principal Research Scientist at ETS, completed her doctorate at Michigan State University in Curriculum, Teaching, and Educational Policy after earning her B.A. in Chemistry at Dartmouth College. A former high school science teacher and teacher educator, Courtney’s work looks across actors in the educational system to better understand the intersections of research, policy and practice. Her studies use mixed-methods to analyze the measurement of teaching and the validity of measures of teaching quality focusing in particular on observational assessments of teaching. Current and recent studies funded by IES, OECD, and the W.T. Grant and Spencer Foundations investigate how administrators learn to use a high stakes observation protocol, how raters use subject specific and general protocols, how measures of teaching compare across countries, and the ways in which observation protocols capture high quality teaching for students with special needs. As part of her work on a research and development project for initial teacher licensure, Courtney is collaborating with colleagues at ETS, TeachLivE, and TeachingWorks (University of Michigan) to develop measures of beginning teachers’ content knowledge for teaching and ability to enact high leverage practices in ELA and mathematics. Courtney served on technical advisory committees for three Race to the Top states as they implemented teacher evaluation reforms. She has published in scholarly journals including Educational Assessment, Educational Evaluation and Policy Analysis, Journal for Research in Mathematics Education, American Journal of Education, Journal of Education Policy, and Teachers College Record. She also co-edited the 5th Edition of the AERA’s Handbook of Research on Teaching.
This policy note is a part of ETS's Equity in Education Series. The series provides a multifacete... more This policy note is a part of ETS's Equity in Education Series. The series provides a multifaceted exploration of prospects for developing indicators of teaching quality. These measures would move beyond inputs and outcomes to capture important features of what happens in the classroom that could lead to more equitable access to high-quality teaching for traditionally under-served students. This second note in the series describes the case of the teaching quality indicators in the preK area.
There are research syntheses that review what the field knows about various aspects of STEM teach... more There are research syntheses that review what the field knows about various aspects of STEM teacher preparation (e.g., National Research Council, 2000; Wilson, 2011) and reviews of teacher preparation across subjects (e.g., Cochran-Smith & Zeichner, 2013; Cochran-Smith, Villegas, Abrams, Chavez-Moreno, Mills & Stern, 2016). This review takes up yet another, related topic -- the research and measurement approaches used to study STEM teacher preparation. Drawing on recent articles in STEM and general teacher education journals, the review takes a situated perspective and categorizes the research into seven inductively developed purposes: understanding STEM preservice teacher learning and development, improving educator preparation programs, contributing to program accountability, describing and understanding relationships between STEM preparation and valued outcomes, understanding assessments and measurement of STEM preparation, framing and reframing issues of STEM preparation, and understanding teacher educators and their practices. Within each of these purposes, the review summarizes the questions and phenomena under investigation and the methodological approaches used to understand these questions and phenomena. The authors offer insights about the questions and phenomena that have not yet been addressed in each purpose and suggest varied research agendas that could help the field strengthen research on and measurement of STEM teacher preparation.
When it comes to their child’s performance, parents have a rich trove of information, including g... more When it comes to their child’s performance, parents have a rich trove of information, including grades on homework assignments, report card results, statewide standardized testing, and much more. But there’s a critical gap in that collection of data: a scarcity of information about the quality of teaching going on in the classroom.
Drawing on the model used widely in the health-care field, the authors of “Quest for Quality: An Indicator System for Teaching” propose a dashboard of indicators of teaching quality to address this crucial missing link.
This report marks the launch of the Equity in Education Series, which will provide a multifaceted exploration of prospects for developing indicators of teaching quality. Equity is a major part of this portrait of teaching. There is robust data showing large inequalities in access to good teachers across America, especially for populations with low income, minoritized students, and those for whom English is not their first language.
Educational Assessment, Evaluation, and Accountability, 2019
Researchers and practitioners sometimes presume that using a previously "validated" instrument wi... more Researchers and practitioners sometimes presume that using a previously "validated" instrument will produce "valid" scores; however, contemporary views of validity suggest that there are many reasons this assumption can be faulty. In order to demonstrate just some of the problems with this view, and to support comparisons of different observation protocols across contexts, we introduce and define the conceptual tool of an observation system. We then describe psychometric evidence of a popular teacher observation instrument, Charlotte Danielson’s Framework for Teaching, in three use contexts—a lower-stakes research context, a lower-stakes practice-based context, and a higher-stakes practice-based context. Despite sharing a common instrument, we find the three observation systems and their associated use contexts combine to produce different average teacher scores, variation in score distributions, and different levels of precision in scores. However, all three systems produce higher average scores in the classroom environment domain than the instructional domain and all three sets of scores support a one-factor model, whereas the Framework posits four factors. We discuss how the dependencies between aspects of observation systems and practical constraints leave researchers with significant validation challenges and opportunities.
Teacher observations are being used for high-stakes purposes in states across the country, and ad... more Teacher observations are being used for high-stakes purposes in states across the country, and administrators often serve as raters in teacher evaluation systems. This paper examines how the cognitive aspects of administrators’ use of an observation instrument, a modified version of Charlotte Danielson’s Framework for Teaching, interact with the complex and dynamic rating contexts in applied settings. Findings suggest that administrators’ rating strategies and rating approaches vary as the characteristics of the rating contexts differ. Even shortly after training (and more so as time passed), raters used reasoning strategies not supported by their training to make scoring decisions. We discuss the implications of the findings for the training of raters and the development of evaluation systems in high-stakes contexts.
Observation systems are increasingly used around the world for a
variety of purposes; 2 critical ... more Observation systems are increasingly used around the world for a variety of purposes; 2 critical purposes are to understand and to improve teaching. As observation systems differ considerably, individuals must decide what observation system to use. But the field does not have a common specification of an observation system, nor does it have systematic ways of thinking about how observation systems are similar and different. Given this reality and the renewed global interest in observation systems, this article first defines the observation system concept and then presents a framework through which to understand, categorize, and compare observation systems. We apply the framework to 4 well known observation systems that vary in important ways. The article concludes with a discussion of the results of the application of the framework and some important implications of those findings.
All 50 states use observations to evaluate practicing teachers, but we know little about how admi... more All 50 states use observations to evaluate practicing teachers, but we know little about how administrators actually reason when they use those observation protocols. Drawing on think-aloud and stimulated recall data, this study describes the types of strategies and warrants practicing administrators used when rating with their district’s observation protocol. Administrators in a large urban district used an observation protocol aligned to Danielson’s Framework for Teaching to rate a brief lesson clip. Administrators’ thinking was recorded, clarified, and inductively coded. Findings suggest administrator thinking and justification is complex even for short lengths of instruction. Administrators used a range of reasoning strategies, many of which were not sanctioned by their training. Exploratory analyses suggest strategy use was not related to the accuracy of ratings. Implications for the validity of teacher observation scores in high-stakes settings are considered.
Valid measurement of how students' experiences in secondary school classrooms lead to gains i... more Valid measurement of how students' experiences in secondary school classrooms lead to gains in learning requires a developmental approach to conceptualizing classroom processes. This article presents a potentially useful theoretical model, the Teaching Through Interactions framework, which posits teacher-student interactions as a central driver for student learning and that teacher-student interactions can be organized into three major domains. Results from 1,482 classrooms provide evidence for distinct emotional, organizational, and instructional domains of teacher-student interaction. It also appears that a three-factor structure is a better fit to observational data than alternative one- and two-domain models of teacher-student classroom interactions, and that the three-domain structure is generalizable from 6th through 12th grade. Implications for practitioners, stakeholders, and researchers are discussed.
American Journal of Respiratory and Critical Care Medicine, 2001
Primary sensitization to antigens may occur prenatally. We hypothesized that high prenatal exposu... more Primary sensitization to antigens may occur prenatally. We hypothesized that high prenatal exposure to indoor antigens increases the risk for sensitization in newborns in New York City populations with increased risk for asthma. We also investigated whether maternal sensitization is required for in utero sensitization to occur. One hundred sixty-seven pregnant African American or Dominican women residing in northern Manhattan were recruited and antigen was measured from home dust. After delivery, newborn cord and maternal blood were assayed for IgE and mononuclear cell proliferation and cytokine production in response to antigen. Cockroach, mouse, but not dust mite antigens, were commonly elevated in the kitchens and pregnant mothers' beds. Increased mononuclear cell proliferation occurred in 54% of newborns in response to cockroach, 25% in response to dust mite Dermatophagoides pteronyssinus, 40% in response to dust mite D. farinae, and 34% in response to mouse protein extracts. Antigen-induced mononuclear cell proliferation occurred in cord blood even in the absence of antigen-induced mononuclear cell proliferation in the mother. Proliferation in response to antigens did not correlate with IgE levels, but proliferation in response to dust mite extracts correlated with interluekin-5 (IL-5) production in cord blood. These results suggest that (1) high prenatal exposures to cockroach and mouse antigens are prevalent; (2) in utero sensitization to multiple indoor antigens is common, occurs to a different degree than maternal sensitization, and may involve IL-5 upregulation.
ABSTRACT The research we report examines the impact of a nationally disseminated professional dev... more ABSTRACT The research we report examines the impact of a nationally disseminated professional development program, Developing Mathematical Ideas (DMI), on teacher's specialized knowledge for teaching mathematics. DMI participants were compared with colleagues from similar schools in the same region. Teacher knowledge was measured with two instruments: multiple choice items developed by the Study of Instructional Improvement and open-ended items developed primarily from assessments previously used by DMI. After controlling for pretest scores on both assessments, a hierarchical linear model suggested there were statistically significant differences between the two groups; the DMI group outperformed the comparison group on both assessments. Gains in teachers' scores were related to the degree of facilitator experience with DMI. Limitations of the study and challenges associated with documenting the relationships among teacher learning, facilitator experience, and professional development program features are discussed.
This report on the second year of data collection in the Understanding Consequential Assessment S... more This report on the second year of data collection in the Understanding Consequential Assessment Systems of Teaching (UCAST) study describes administrators’ learning during the first year that Los Angeles Unified School District’s (LAUSD) observations were part of a consequential teacher evaluation system for teachers. Drawing on mixed methods, the report provides an overview of the Teacher Growth and Development Cycle (TGDC) implementation during the 2013–2014 school year. It summarizes how much time administrators spent on TGDC-related activities and presents their reflections on the strengths and weaknesses of the TGDC implementation. The report also documents how administrators used the TGDC process with their teachers, how their use of the protocol changed over two school years, and when the observation process did and did not work well. We also review data from two in-depth cases that reflect wider themes in the sample.
This report summarizes what was learned from the first year of the Understanding Consequential As... more This report summarizes what was learned from the first year of the Understanding Consequential Assessment Systems of Teaching (UCAST) study. The study seeks to understand how administrators learn to use the observation portion of the Los Angeles Unified School District’s (LAUSD) consequential teacher evaluation system for teachers. The report describes the 2012–2013 implementation in which more than 1,000 LAUSD building administrators were trained and certified to implement the observations that are a part of the Teacher Growth and Development Cycle (TGDC). Each administrator worked with one teacher during this gradual implementation year. Using mixed methods, the report summarizes administrators’ background characteristics, perceptions of training and of the TGDC system, results of the training and certification, and use of the TGDC system. Recommendations for revisions to the system are also included.
This paper provides a description and rationale for a performance assessment of a teaching practi... more This paper provides a description and rationale for a performance assessment of a teaching practice—leading a classroom discussion (LCD)—included in the ETS® National Observational Teaching Examination (NOTE) assessment series. In this assessment, candidates interact with a small class of virtual students represented by avatars in a computer-based, simulated classroom. The five avatars are enacted by a single simulation specialist who has been trained and certified on the particular task presented, either in elementary English language arts or mathematics. The paper defines and describes the construct of LCD, then provides a review of the research and scholarly literature that supports the importance of this practice for effective teaching, and finally describes how the construct is measured in the NOTE assessment.
This article develops a validity argument approach for use on observation protocols currently use... more This article develops a validity argument approach for use on observation protocols currently used to assess teacher quality for high-stakes personnel and professional development decisions. After defining the teaching quality domain, we articulate an interpretive argument for observation protocols. To illustrate the types of evidence that might compose a validity argument, we draw on data from a validity study of the Classroom Assessment Scoring System for secondary classrooms. Based on data from 82 Algebra classrooms, we illustrate how data from observation scores, value-added models, generalizability studies, and measures of teacher knowledge, student achievement, and teacher and student beliefs could be used to build a validity argument for observation protocols. Strengths and limitations of the validity argument approach as well as the issues the approach raises for observation protocol validity research are considered.
Chaper 20 from APA Handbook of Testing and Assessment in Psychology: Vol. 3. Testing and Assessme... more Chaper 20 from APA Handbook of Testing and Assessment in Psychology: Vol. 3. Testing and Assessment in School Psychology and Education, K. F. Geisinger (Editor)
Formative assessment is increasingly held up as a practice that can help teachers adjust and impr... more Formative assessment is increasingly held up as a practice that can help teachers adjust and improve their daily instruction. But how do teachers learn to use formative assessment? This paper describes the changes teachers experienced as they went through a sustained professional development program focused on formative assessment. Teachers reported that learning formative assessment affected their professional practices, student practices, and support system. We discuss the implications of these changes for how professional development leaders listen to, support, and challenge teachers learning formative assessment. Introduction As a result of the ongoing standards movement and No Child Left Behind, there is a plethora of data available to teachers today. Teachers have access to standardized test scores, interim assessment outcomes, and data they gather as part of their daily teaching practice. There is so much data, however, that teachers can become overwhelmed. Frequently they do not know how to interpret or change their teaching practice in response to those data. Some field experts think formative assessment could be a solution (e.g., Stiggins, 2004). Still, for formative assessment to help teachers change their classroom practices, they must learn about formative assessment—and that is where professional development (PD) comes in. Our knowledge of the subject-specific nature of high quality PD suggests there are important learning challenges that teachers face as they engage in content-specific professional development (e.g., Cohen & Hill, 1998; Wilson & Berne, 1999). Those challenges are related to the particular cognitive demands reform-minded teaching requires (e.g., Smith, 1996). We begin with the assumption that learning formative assessment also has particular cognitive challenges. For example, teachers must learn how to systematically collect evidence of all students' learning as that learning is happening in real time. Once teachers begin collecting such evidence, they must learn new ways of pacing, differentiating, organizing , and adapting their instruction in order to adequately address the range of student thinking in the classroom. This paper informs increasing calls for formative assessment (i.e., Black & Wiliam, 1998b; Popham, 2008; Stiggins, 2004) by describing the changes teachers saw in their professional practice as they were in the process of learning formative assessment. We discuss the implications of the changes for PD leaders. Specifically, we discuss the kinds of support teachers need as they begin to learn about formative assessment.
This policy note is a part of ETS's Equity in Education Series. The series provides a multifacete... more This policy note is a part of ETS's Equity in Education Series. The series provides a multifaceted exploration of prospects for developing indicators of teaching quality. These measures would move beyond inputs and outcomes to capture important features of what happens in the classroom that could lead to more equitable access to high-quality teaching for traditionally under-served students. This second note in the series describes the case of the teaching quality indicators in the preK area.
There are research syntheses that review what the field knows about various aspects of STEM teach... more There are research syntheses that review what the field knows about various aspects of STEM teacher preparation (e.g., National Research Council, 2000; Wilson, 2011) and reviews of teacher preparation across subjects (e.g., Cochran-Smith & Zeichner, 2013; Cochran-Smith, Villegas, Abrams, Chavez-Moreno, Mills & Stern, 2016). This review takes up yet another, related topic -- the research and measurement approaches used to study STEM teacher preparation. Drawing on recent articles in STEM and general teacher education journals, the review takes a situated perspective and categorizes the research into seven inductively developed purposes: understanding STEM preservice teacher learning and development, improving educator preparation programs, contributing to program accountability, describing and understanding relationships between STEM preparation and valued outcomes, understanding assessments and measurement of STEM preparation, framing and reframing issues of STEM preparation, and understanding teacher educators and their practices. Within each of these purposes, the review summarizes the questions and phenomena under investigation and the methodological approaches used to understand these questions and phenomena. The authors offer insights about the questions and phenomena that have not yet been addressed in each purpose and suggest varied research agendas that could help the field strengthen research on and measurement of STEM teacher preparation.
When it comes to their child’s performance, parents have a rich trove of information, including g... more When it comes to their child’s performance, parents have a rich trove of information, including grades on homework assignments, report card results, statewide standardized testing, and much more. But there’s a critical gap in that collection of data: a scarcity of information about the quality of teaching going on in the classroom.
Drawing on the model used widely in the health-care field, the authors of “Quest for Quality: An Indicator System for Teaching” propose a dashboard of indicators of teaching quality to address this crucial missing link.
This report marks the launch of the Equity in Education Series, which will provide a multifaceted exploration of prospects for developing indicators of teaching quality. Equity is a major part of this portrait of teaching. There is robust data showing large inequalities in access to good teachers across America, especially for populations with low income, minoritized students, and those for whom English is not their first language.
Educational Assessment, Evaluation, and Accountability, 2019
Researchers and practitioners sometimes presume that using a previously "validated" instrument wi... more Researchers and practitioners sometimes presume that using a previously "validated" instrument will produce "valid" scores; however, contemporary views of validity suggest that there are many reasons this assumption can be faulty. In order to demonstrate just some of the problems with this view, and to support comparisons of different observation protocols across contexts, we introduce and define the conceptual tool of an observation system. We then describe psychometric evidence of a popular teacher observation instrument, Charlotte Danielson’s Framework for Teaching, in three use contexts—a lower-stakes research context, a lower-stakes practice-based context, and a higher-stakes practice-based context. Despite sharing a common instrument, we find the three observation systems and their associated use contexts combine to produce different average teacher scores, variation in score distributions, and different levels of precision in scores. However, all three systems produce higher average scores in the classroom environment domain than the instructional domain and all three sets of scores support a one-factor model, whereas the Framework posits four factors. We discuss how the dependencies between aspects of observation systems and practical constraints leave researchers with significant validation challenges and opportunities.
Teacher observations are being used for high-stakes purposes in states across the country, and ad... more Teacher observations are being used for high-stakes purposes in states across the country, and administrators often serve as raters in teacher evaluation systems. This paper examines how the cognitive aspects of administrators’ use of an observation instrument, a modified version of Charlotte Danielson’s Framework for Teaching, interact with the complex and dynamic rating contexts in applied settings. Findings suggest that administrators’ rating strategies and rating approaches vary as the characteristics of the rating contexts differ. Even shortly after training (and more so as time passed), raters used reasoning strategies not supported by their training to make scoring decisions. We discuss the implications of the findings for the training of raters and the development of evaluation systems in high-stakes contexts.
Observation systems are increasingly used around the world for a
variety of purposes; 2 critical ... more Observation systems are increasingly used around the world for a variety of purposes; 2 critical purposes are to understand and to improve teaching. As observation systems differ considerably, individuals must decide what observation system to use. But the field does not have a common specification of an observation system, nor does it have systematic ways of thinking about how observation systems are similar and different. Given this reality and the renewed global interest in observation systems, this article first defines the observation system concept and then presents a framework through which to understand, categorize, and compare observation systems. We apply the framework to 4 well known observation systems that vary in important ways. The article concludes with a discussion of the results of the application of the framework and some important implications of those findings.
All 50 states use observations to evaluate practicing teachers, but we know little about how admi... more All 50 states use observations to evaluate practicing teachers, but we know little about how administrators actually reason when they use those observation protocols. Drawing on think-aloud and stimulated recall data, this study describes the types of strategies and warrants practicing administrators used when rating with their district’s observation protocol. Administrators in a large urban district used an observation protocol aligned to Danielson’s Framework for Teaching to rate a brief lesson clip. Administrators’ thinking was recorded, clarified, and inductively coded. Findings suggest administrator thinking and justification is complex even for short lengths of instruction. Administrators used a range of reasoning strategies, many of which were not sanctioned by their training. Exploratory analyses suggest strategy use was not related to the accuracy of ratings. Implications for the validity of teacher observation scores in high-stakes settings are considered.
Valid measurement of how students' experiences in secondary school classrooms lead to gains i... more Valid measurement of how students' experiences in secondary school classrooms lead to gains in learning requires a developmental approach to conceptualizing classroom processes. This article presents a potentially useful theoretical model, the Teaching Through Interactions framework, which posits teacher-student interactions as a central driver for student learning and that teacher-student interactions can be organized into three major domains. Results from 1,482 classrooms provide evidence for distinct emotional, organizational, and instructional domains of teacher-student interaction. It also appears that a three-factor structure is a better fit to observational data than alternative one- and two-domain models of teacher-student classroom interactions, and that the three-domain structure is generalizable from 6th through 12th grade. Implications for practitioners, stakeholders, and researchers are discussed.
American Journal of Respiratory and Critical Care Medicine, 2001
Primary sensitization to antigens may occur prenatally. We hypothesized that high prenatal exposu... more Primary sensitization to antigens may occur prenatally. We hypothesized that high prenatal exposure to indoor antigens increases the risk for sensitization in newborns in New York City populations with increased risk for asthma. We also investigated whether maternal sensitization is required for in utero sensitization to occur. One hundred sixty-seven pregnant African American or Dominican women residing in northern Manhattan were recruited and antigen was measured from home dust. After delivery, newborn cord and maternal blood were assayed for IgE and mononuclear cell proliferation and cytokine production in response to antigen. Cockroach, mouse, but not dust mite antigens, were commonly elevated in the kitchens and pregnant mothers' beds. Increased mononuclear cell proliferation occurred in 54% of newborns in response to cockroach, 25% in response to dust mite Dermatophagoides pteronyssinus, 40% in response to dust mite D. farinae, and 34% in response to mouse protein extracts. Antigen-induced mononuclear cell proliferation occurred in cord blood even in the absence of antigen-induced mononuclear cell proliferation in the mother. Proliferation in response to antigens did not correlate with IgE levels, but proliferation in response to dust mite extracts correlated with interluekin-5 (IL-5) production in cord blood. These results suggest that (1) high prenatal exposures to cockroach and mouse antigens are prevalent; (2) in utero sensitization to multiple indoor antigens is common, occurs to a different degree than maternal sensitization, and may involve IL-5 upregulation.
ABSTRACT The research we report examines the impact of a nationally disseminated professional dev... more ABSTRACT The research we report examines the impact of a nationally disseminated professional development program, Developing Mathematical Ideas (DMI), on teacher's specialized knowledge for teaching mathematics. DMI participants were compared with colleagues from similar schools in the same region. Teacher knowledge was measured with two instruments: multiple choice items developed by the Study of Instructional Improvement and open-ended items developed primarily from assessments previously used by DMI. After controlling for pretest scores on both assessments, a hierarchical linear model suggested there were statistically significant differences between the two groups; the DMI group outperformed the comparison group on both assessments. Gains in teachers' scores were related to the degree of facilitator experience with DMI. Limitations of the study and challenges associated with documenting the relationships among teacher learning, facilitator experience, and professional development program features are discussed.
This report on the second year of data collection in the Understanding Consequential Assessment S... more This report on the second year of data collection in the Understanding Consequential Assessment Systems of Teaching (UCAST) study describes administrators’ learning during the first year that Los Angeles Unified School District’s (LAUSD) observations were part of a consequential teacher evaluation system for teachers. Drawing on mixed methods, the report provides an overview of the Teacher Growth and Development Cycle (TGDC) implementation during the 2013–2014 school year. It summarizes how much time administrators spent on TGDC-related activities and presents their reflections on the strengths and weaknesses of the TGDC implementation. The report also documents how administrators used the TGDC process with their teachers, how their use of the protocol changed over two school years, and when the observation process did and did not work well. We also review data from two in-depth cases that reflect wider themes in the sample.
This report summarizes what was learned from the first year of the Understanding Consequential As... more This report summarizes what was learned from the first year of the Understanding Consequential Assessment Systems of Teaching (UCAST) study. The study seeks to understand how administrators learn to use the observation portion of the Los Angeles Unified School District’s (LAUSD) consequential teacher evaluation system for teachers. The report describes the 2012–2013 implementation in which more than 1,000 LAUSD building administrators were trained and certified to implement the observations that are a part of the Teacher Growth and Development Cycle (TGDC). Each administrator worked with one teacher during this gradual implementation year. Using mixed methods, the report summarizes administrators’ background characteristics, perceptions of training and of the TGDC system, results of the training and certification, and use of the TGDC system. Recommendations for revisions to the system are also included.
This paper provides a description and rationale for a performance assessment of a teaching practi... more This paper provides a description and rationale for a performance assessment of a teaching practice—leading a classroom discussion (LCD)—included in the ETS® National Observational Teaching Examination (NOTE) assessment series. In this assessment, candidates interact with a small class of virtual students represented by avatars in a computer-based, simulated classroom. The five avatars are enacted by a single simulation specialist who has been trained and certified on the particular task presented, either in elementary English language arts or mathematics. The paper defines and describes the construct of LCD, then provides a review of the research and scholarly literature that supports the importance of this practice for effective teaching, and finally describes how the construct is measured in the NOTE assessment.
This article develops a validity argument approach for use on observation protocols currently use... more This article develops a validity argument approach for use on observation protocols currently used to assess teacher quality for high-stakes personnel and professional development decisions. After defining the teaching quality domain, we articulate an interpretive argument for observation protocols. To illustrate the types of evidence that might compose a validity argument, we draw on data from a validity study of the Classroom Assessment Scoring System for secondary classrooms. Based on data from 82 Algebra classrooms, we illustrate how data from observation scores, value-added models, generalizability studies, and measures of teacher knowledge, student achievement, and teacher and student beliefs could be used to build a validity argument for observation protocols. Strengths and limitations of the validity argument approach as well as the issues the approach raises for observation protocol validity research are considered.
Chaper 20 from APA Handbook of Testing and Assessment in Psychology: Vol. 3. Testing and Assessme... more Chaper 20 from APA Handbook of Testing and Assessment in Psychology: Vol. 3. Testing and Assessment in School Psychology and Education, K. F. Geisinger (Editor)
Formative assessment is increasingly held up as a practice that can help teachers adjust and impr... more Formative assessment is increasingly held up as a practice that can help teachers adjust and improve their daily instruction. But how do teachers learn to use formative assessment? This paper describes the changes teachers experienced as they went through a sustained professional development program focused on formative assessment. Teachers reported that learning formative assessment affected their professional practices, student practices, and support system. We discuss the implications of these changes for how professional development leaders listen to, support, and challenge teachers learning formative assessment. Introduction As a result of the ongoing standards movement and No Child Left Behind, there is a plethora of data available to teachers today. Teachers have access to standardized test scores, interim assessment outcomes, and data they gather as part of their daily teaching practice. There is so much data, however, that teachers can become overwhelmed. Frequently they do not know how to interpret or change their teaching practice in response to those data. Some field experts think formative assessment could be a solution (e.g., Stiggins, 2004). Still, for formative assessment to help teachers change their classroom practices, they must learn about formative assessment—and that is where professional development (PD) comes in. Our knowledge of the subject-specific nature of high quality PD suggests there are important learning challenges that teachers face as they engage in content-specific professional development (e.g., Cohen & Hill, 1998; Wilson & Berne, 1999). Those challenges are related to the particular cognitive demands reform-minded teaching requires (e.g., Smith, 1996). We begin with the assumption that learning formative assessment also has particular cognitive challenges. For example, teachers must learn how to systematically collect evidence of all students' learning as that learning is happening in real time. Once teachers begin collecting such evidence, they must learn new ways of pacing, differentiating, organizing , and adapting their instruction in order to adequately address the range of student thinking in the classroom. This paper informs increasing calls for formative assessment (i.e., Black & Wiliam, 1998b; Popham, 2008; Stiggins, 2004) by describing the changes teachers saw in their professional practice as they were in the process of learning formative assessment. We discuss the implications of the changes for PD leaders. Specifically, we discuss the kinds of support teachers need as they begin to learn about formative assessment.
Uploads
Papers by Courtney Bell
Drawing on the model used widely in the health-care field, the authors of “Quest for Quality: An Indicator System for Teaching” propose a dashboard of indicators of teaching quality to address this crucial missing link.
This report marks the launch of the Equity in Education Series, which will provide a multifaceted exploration of prospects for developing indicators of teaching quality. Equity is a major part of this portrait of teaching. There is robust data showing large inequalities in access to good teachers across America, especially for populations with low income, minoritized students, and those for whom English is not their first language.
The paper is available here: https://www.ets.org/s/research/pdf/quest-for-quality.pdf.
in teacher evaluation systems. This paper examines how the cognitive aspects of administrators’ use of an observation instrument, a
modified version of Charlotte Danielson’s Framework for Teaching, interact with the complex and dynamic rating contexts in applied
settings. Findings suggest that administrators’ rating strategies and rating approaches vary as the characteristics of the rating contexts
differ. Even shortly after training (and more so as time passed), raters used reasoning strategies not supported by their training to make
scoring decisions. We discuss the implications of the findings for the training of raters and the development of evaluation systems in
high-stakes contexts.
variety of purposes; 2 critical purposes are to understand and to
improve teaching. As observation systems differ considerably,
individuals must decide what observation system to use. But the
field does not have a common specification of an observation
system, nor does it have systematic ways of thinking about how
observation systems are similar and different. Given this reality
and the renewed global interest in observation systems, this article
first defines the observation system concept and then presents
a framework through which to understand, categorize, and compare
observation systems. We apply the framework to 4 well known observation systems that vary in important ways. The
article concludes with a discussion of the results of the application
of the framework and some important implications of those
findings.
and teacher and student beliefs could be used to build a validity argument for observation protocols. Strengths and limitations of the validity argument approach as well as the issues the approach raises
for observation protocol validity research are considered.
K. F. Geisinger (Editor)
Drawing on the model used widely in the health-care field, the authors of “Quest for Quality: An Indicator System for Teaching” propose a dashboard of indicators of teaching quality to address this crucial missing link.
This report marks the launch of the Equity in Education Series, which will provide a multifaceted exploration of prospects for developing indicators of teaching quality. Equity is a major part of this portrait of teaching. There is robust data showing large inequalities in access to good teachers across America, especially for populations with low income, minoritized students, and those for whom English is not their first language.
The paper is available here: https://www.ets.org/s/research/pdf/quest-for-quality.pdf.
in teacher evaluation systems. This paper examines how the cognitive aspects of administrators’ use of an observation instrument, a
modified version of Charlotte Danielson’s Framework for Teaching, interact with the complex and dynamic rating contexts in applied
settings. Findings suggest that administrators’ rating strategies and rating approaches vary as the characteristics of the rating contexts
differ. Even shortly after training (and more so as time passed), raters used reasoning strategies not supported by their training to make
scoring decisions. We discuss the implications of the findings for the training of raters and the development of evaluation systems in
high-stakes contexts.
variety of purposes; 2 critical purposes are to understand and to
improve teaching. As observation systems differ considerably,
individuals must decide what observation system to use. But the
field does not have a common specification of an observation
system, nor does it have systematic ways of thinking about how
observation systems are similar and different. Given this reality
and the renewed global interest in observation systems, this article
first defines the observation system concept and then presents
a framework through which to understand, categorize, and compare
observation systems. We apply the framework to 4 well known observation systems that vary in important ways. The
article concludes with a discussion of the results of the application
of the framework and some important implications of those
findings.
and teacher and student beliefs could be used to build a validity argument for observation protocols. Strengths and limitations of the validity argument approach as well as the issues the approach raises
for observation protocol validity research are considered.
K. F. Geisinger (Editor)