Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

State Policy Related to Teacher Licensure

2003, Educational Policy

State Policy Related to Teacher Licensure March, 2002 Contacts: Peter Youngs, Research Associate Wisconsin Center for Education Research Room 579 1025 W. Johnson St. Madison, WI 53706 (608)263-4288 (phone) (608)263-6448 (fax) payoungs@students.wisc.edu Allan Odden, Co-Director Consortium for Policy Research in Education Wisconsin Center for Education Research Room 653 1025 W. Johnson St. Madison, WI 53706 (608)263-4260 (phone) arodden@facstaff.wisc.edu Andrew C. Porter, Director Wisconsin Center for Education Research Room 785 1025 W. Johnson St. Madison, WI 53706 (608)263-4200 (phone) andyp@education.wisc.edu Author’s Note: The research reported in this paper was supported by a grant from the U.S. Department of Education, Office of Educational Research and Improvement, National Institute on Educational Governance, Finance, Policy-Making, and Management, to the Consortium for Policy Research in Education (CPRE) and the Wisconsin Center for Education Research, School of Education, University of Wisconsin-Madison (Grant No. OERI-R3086A60003). The opinions expressed are those of the authors and do not necessarily reflect the view of the National Institute on Educational Governance, Finance, Policy-Making, and Management, Office of Educational Research and Improvement, U.S. Department of Education, the institutional partners of CPRE, or the Wisconsin Center for Education Research. Abstract This paper is based on a study of the licensure requirements in all 50 states in the United States. For the study, data was collected on whether states require licensure candidates to pass tests of basic skills, subject matter knowledge, and/or pedagogical knowledge; whether states fund statewide induction programs; and whether states employ performance assessments in making professional licensure decisions. The purpose of the paper is to compare the licensure requirements of the 12 states that piloted portfolios developed by the Interstate New Teacher Assessment and Support Consortium (INTASC) with those of the other states to determine whether the INTASC portfolio states have taken on leadership roles in certain areas of teacher policy. The paper also examines differences among various performance assessments and considers some challenges to their use in making licensure decisions, including costs, the need for evidence related to validity and reliability, and potential effects on teacher supply. Key Words: Teacher Assessment, Teacher Licensure, State Policy State Policy Related to Teacher Licensure Over the past decade, many states have developed and/or implemented new assessments for beginning teachers. These approaches to assessment include standardized, multiple-choice tests, constructed response questions, teacher-developed portfolios, classroom observations, and structured interviews. In one effort, several states collaborated through the Interstate New Teacher Assessment and Support Consortium (INTASC) to develop and pilot-test portfolios in mathematics, English/language arts, and science. For a discussion of the history and purpose of INTASC, see Porter, Youngs, & Odden (2001). This paper reports on a study that compared licensure requirements in 2001-02 in the 12 states that piloted the INTASC portfolios with those of the other 38 states to determine whether the INTASC portfolio states have taken on leadership roles in certain areas of teacher policy. In particular, the study examined state policies regarding teacher testing for initial licensure, new teacher induction, and the use of performance assessments in making professional licensure decisions. In this paper, we use the terms “licensure” and “certification” as they are commonly used in other professions. More specifically, the term licensure is employed to describe state decisions regarding initial admission to practice while “certification” is used to refer to the National Board for Professional Teaching Standards’ efforts to certify accomplished practice. Further, “initial licensure” requirements are defined as those that teaching candidates must meet in order to begin working as classroom teachers in public schools and “professional licensure” requirements are defined as those that practicing teachers must meet in order to continue teaching in public schools. The study found that the INTASC portfolio states were more likely than other states to test teaching candidates’ knowledge of subject matter and pedagogy, train mentors and fund statewide induction programs, and use performance assessments with initially-licensed teachers. The data for the study come from phone interviews or electronic communication with state education officials. In fall 2001, the Consortium for Policy Research in Education (CPRE) gathered information from all 50 states about their initial and professional licensure requirements and their policies regarding induction. This paper is based on the study and has five sections. Section one examines whether states require teaching candidates to pass tests of basic literacy skills, subject matter knowledge, and/or pedagogical knowledge in order to earn an initial teaching license. This section also reviews research on the relationship of teachers’ performance on licensure tests to student achievement, and considers issues related to test quality. In section two, the paper considers the extent to which states have required and funded statewide induction programs for new teachers. More specifically, the paper reports on the amount of state funding for mentoring and other induction activities, the nature of state-provided mentor training, and whether performance assessments are used in induction. Section three describes the use of performance assessments in several states in making professional licensure decisions, and examines differences among various assessments. Two states, Connecticut and North Carolina, currently use content-specific portfolios for licensure decisions while several others employ classroom observations and interviews or local evaluation procedures. The use of performance assessments for formative purposes in California is also discussed as well as state policies requiring institutions of higher education to incorporate performance assessments into teacher preparation programs. The fourth section considers some challenges to the use of performance assessments in making licensure decisions, including costs, the need to establish evidence related to validity and reliability, and potential effects on teacher supply. Finally, the paper ends with a summary and conclusion. I. State Policy Regarding the Testing of Basic Skills, Content Knowledge, and Pedagogical Knowledge The use of tests of basic skills, content knowledge, and pedagogical knowledge in making initial licensure decisions has been the subject of much debate and controversy among educators, researchers, and policy makers. On one hand, researchers have found strong relationships between student achievement and teachers’ performance on some licensure tests or comparable teacher tests (e.g., Ehrenberg & Brewer, 1995; Ferguson, 1991; Hanushek, 1992). In addition, many policy makers feel that licensure tests are an appropriate way to ensure that teachers have adequate literacy skills, subject matter knowledge, and knowledge of teaching. On the other hand, test developers and states that implement teacher tests have sometimes been criticized for providing insufficient evidence of test validity and reliability and for not adequately addressing issues related to fairness (Haney, Madaus, & Kreitzer, 1987; Haertel, 1991; Smith, Miller, & Joy, 1988). This section describes tests of basic skills, content knowledge, and pedagogical knowledge developed by two national organizations, the Educational Testing Service (ETS) and National Evaluation Systems (NES), and considers the extent to which the INTASC portfolio states and other states employed these and comparable tests in making initial licensure decisions in 2001-02. We also discuss research on the relationship between teachers’ test performance and student achievement, as well as other relevant research, and we note several issues related to test quality that states must address in implementing licensure tests. Tests of Basic Skills In 2001-02, teaching candidates in nine of the 12 states (75%) that piloted INTASC portfolios were required to pass a test of basic skills either to gain admission to a teacher education program or to earn an initial teaching license (See Table 1). The 12 states that piloted the INTASC portfolios from 1996 to 1999 were Connecticut, Delaware, Illinois, Indiana, Kentucky, Louisiana, New York, Ohio, Pennsylvania, Rhode Island, South Carolina, and Texas Of these nine states, six employed Praxis I developed by ETS while three contracted with NES or used their own state-developed tests. Of the other 38 states, 28 (73.7%) required teaching candidates to pass basic skills tests; the majority of these used Praxis I. In sum, there was no significant difference in the use of basic skills tests between INTASC portfolio states and other states. The Praxis I assessment measures basic skills in reading, writing, and mathematics and comes in two versions: a paper-and-pencil version, known as the Pre-Professional Skills Test (PPST), and a computer version, known as the Computer-Based Test (CBT). Both versions include multiple-choice questions and an essay. Basic skills tests developed by NES for individual states include the Massachusetts Educator Certification Test (MECT) Communication and Literacy Skills Test, the New Mexico Assessment of Basic Skills, and the Oklahoma General Education Test. For example, the MECT Communication and Literacy Skills Test includes reading and writing subtests that evaluate reading comprehension, vocabulary, grammar, and essay writing through the use of multiple-choice and open-ended items. Due to the recent implementation of Praxis I and many of the basic skills tests developed by NES, there has been little research on the relationship between teachers’ performance on these tests and student achievement or other outcomes. Research on earlier tests, though, of verbal skills (Ehrenberg & Brewer, 1995; Hanushek, 1992) and reading skills and professional knowledge (Ferguson, 1991) indicates that teachers’ performance on such tests is related to gains in student achievement. Erhenberg and Brewer used teachers’ scores on a verbal skills test from the Equality of Educational Opportunity Study (EEO) (Coleman et al., 1966) while Hanushek employed teachers’ scores on a word test from a welfare reform experiment in the Gary, Indiana in the 1970s. Ferguson’s analysis drew on teachers’ scores on the Texas Examination of Current Administrators and Teachers (TECAT), used in Texas during the 1980s and 1990s, which measured teachers’ reading skills and professional knowledge. See Wayne and Youngs (under review) for a review of research on teachers’ performance on licensure tests and other tests of basic skills and student performance gains. This review only included studies that controlled for students’ socio-economic status and prior achievement. In developing basic skills tests (as well as tests of content knowledge and pedagogical knowledge) and using them for licensure decisions, states and testing companies must provide evidence related to test validity and reliability. One central aspect of validity involves job analyses. States and testing companies are expected to conduct job analyses in order to determine the knowledge and skills teachers need in order to practice effectively. Such analyses typically include practitioner surveys regarding the importance of different teaching skills, the accuracy of definitions, and the appropriateness of test questions; observations of teachers’ classroom practices; interviews with teachers; and literature reviews that examine teachers’ job activities (Melnick & Pullin, 2000). Additional ways to demonstrate validity-related evidence include establishing relationships 1) between test content and state K-12 content standards and/or accreditation requirements for teacher preparation programs and 2) between teachers’ test scores and later measures of their performance (e.g., student achievement gains or supervisor evaluations). In terms of reliability, test developers and users need to ensure evidence of consistency across test items and performances. Tests of Content Knowledge The study found that most of the INTASC portfolio states were assessing new teachers’ knowledge of subject matter. As of fall 2001, 10 of the 12 states (83.3%) that piloted INTASC portfolios required teaching candidates to pass tests of subject matter knowledge to earn an initial or professional teaching license (See Table 1). Six of these states used the Praxis II tests of content knowledge while the other states employed tests developed by NES or their own state-developed tests. Of the other 38 states, 23 (60.5%) required teaching candidates to pass content knowledge tests, and the majority of these used Praxis II tests. While the INTASC portfolio states were more likely than other states to use tests to assess teachers’ content knowledge, the results indicate that the majority of states in both groups employed subject matter knowledge tests in making licensure decisions. Praxis II includes two types of content tests: core tests of content knowledge and in-depth tests of content knowledge. The core tests are two hours in length and include only multiple-choice questions while the in-depth tests feature constructed response questions. Subject matter tests developed by NES for individual states include the Arizona Educator Proficiency Assessments (AEPA) Subject Knowledge Tests and the Program for Licensing Assessments for Colorado Educators (PLACE) Content Area Tests. For example, the AEPA Subject Knowledge Tests include both multiple-choice and constructed response questions. There has been little research on the relationship between teachers’ performance on 1) Praxis II content tests or NES-developed subject matter tests and 2) student achievement or other outcomes. Since these tests are designed to measure teaching candidates’ knowledge of subject matter, it can be instructive to consider research on the effect on achievement of teachers’ degrees and coursework, which are also proxies for content knowledge. Controlling for students’ socio-economic status and prior achievement, studies of teachers’ degree level have found that students of teachers with undergraduate or graduate degrees in math have higher achievement gains in this content area (e.g., Goldhaber & Brewer, 1997, 2000; Rowan, Chiang, & Miller, 1997). Further, research with similar controls has found relationships between teachers’ coursetaking in math and student performance in this area (e.g., Monk & King, 1994). At the same time, these and other studies have not identified determinate relationships (positive or negative) in English or social studies between student achievement and either degree levels or coursetaking. Further, research on the impact of coursework in science on achievement has generated mixed results (Monk & King, 1994). See Wayne and Youngs (under review) for a review of research on teachers’ degree levels and coursetaking and student performance gains. This review only included studies that controlled for students’ socio-economic status and prior achievement. While some might interpret these findings as indicating that teachers’ degree levels and coursetaking do not matter, at least in areas other than math, we offer a different perspective. In our view, it is necessary for teachers to be knowledgeable about the subjects they teach in order to promote student achievement, but it may not be sufficient. Instead, teachers may also need general knowledge of pedagogy as well as subject-specific pedagogical knowledge. In other words, future research should examine the relationships between student achievement and 1) teachers’ performances on the new content tests as well as 2) teachers’ performances on measures of pedagogical content knowledge. We return to these points below. Tests of Pedagogical Knowledge In 2001-02, teaching candidates in 10 of the 12 states (83.3%) that piloted INTASC portfolios were required to pass a test of pedagogical knowledge to earn an initial teaching license (See Table 1). Of these states, six employed the Praxis II Principles of Learning and Teaching (PLT) developed by ETS while four contracted with NES or used their own state-developed tests. Tests of pedagogical knowledge are also referred to as tests of professional knowledge. Of the other 38 states, only 16 (42.1%) required teaching candidates to pass pedagogy tests; the majority of these used Praxis II PLT. This finding indicates that the INTASC portfolio states were more likely than other states to use tests to assess teachers’ pedagogical knowledge. Praxis II PLT includes multiple-choice and constructed response questions and are associated with different levels of schooling (e.g., grades K-6, grades 5-9, and grades 7-12). Tests of pedagogical knowledge developed by NES for individual states include the AEPA Professional Knowledge Tests, the New Mexico Assessments of General Knowledge and Teacher Competency, and the Oklahoma Professional Teacher Examination. For example, the AEPA Professional Knowledge Tests use multiple-choice and constructed response questions to assess candidates’ pedagogical knowledge and skills. There has been little research on the relationship between teachers’ performance on Praxis II PLT or the NES-developed tests of pedagogical knowledge and student achievement. Earlier studies, though, suggest that teachers’ pedagogical knowledge is related to student performance (e.g., Ferguson, 1991). Further, it is instructive to consider research on teachers’ licensure status; since teaching candidates in all states must complete coursework in education in order to earn a standard license, licensure status can serve as a proxy for teachers’ pedagogical knowledge. Controlling for students’ socio-economic status and prior achievement, Goldhaber and Brewer (2000) found that math students whose teachers have standard licensure had significantly greater achievement gains than students whose teachers held private school licensure or no licensure. While Goldhaber and Brewer found no difference (in terms of student performance) between teachers with standard licensure and those with emergency licensure, Darling-Hammond, Berry, and Thoreson (2001) note that most teachers with emergency licenses are those who have recently moved from out-of-state or who have recently switched teaching assignments; i.e., they are most likely veteran teachers who already hold some sort of standard license signifying the completion of educational coursework. In science, Goldhaber and Brewer (2000) also found that students of teachers with standard licensure achieved more than those whose teachers were either not licensed in their subject or held a private school licensure, but these findings were not as strong in magnitude or statistical significance as their findings in math. In English and social studies, there have been few studies with controls for students’ socioeconomic status or prior achievement that have compared the effect on achievement of teachers having standard licensure as compared to emergency licensure, private school licensure, or no licensure. We agree with others (e.g., Goldhaber & Brewer, 2001; Darling-Hammond, Berry, and Thoreson; 2001) who have recommended further research on the effects of licensure status on student performance and other outcomes. At the same time, the indeterminate findings of pedagogical knowledge (as represented by licensure status) in English and social studies may indicate that knowledge of pedagogy is necessary in order to promote student learning, but not sufficient. In summary, the INTASC portfolio states were more likely than other states in 2001-02 to test the subject matter knowledge and pedagogical knowledge of teaching candidates while there was no significant difference in the use of basic skills tests between INTASC portfolio states and other states. II. State Policy Related to New Teacher Induction A second important area of teacher licensure policy is induction for beginning teachers. Many school districts face difficulties in recruiting and retaining qualified teachers. Nationwide, more than one-fifth of public school teachers leave their positions within three years (National Center for Education Statistics, 1997), and several studies estimate teacher attrition in the first five years to be between 30 and 50 percent (Grissmer & Kirby, 1987; Murnane et al., 1991). Some teachers leave the profession for personal or financial reasons while others move from one district to another in search of better working conditions. To increase the likelihood that new teachers will remain in the profession and develop effective instructional practices, a number of states have implemented and funded induction programs. Through mentoring, workshops, opportunities to visit other classrooms, and the use of performance assessments, these programs can help beginning teachers refine skills in classroom management, lesson planning, and assessing student learning. This section examines the extent to which the INTASC portfolio states and other states have required and funded statewide induction programs for new teachers. We discuss the amount of state funding for mentoring and other induction activities, the nature of state training for mentors, and whether performance assessments are used in induction. In 2001-02, five of the 12 INTASC portfolio states (41.7%) and nine of the other 38 states (23.7%) had implemented statewide induction programs for new teachers and were providing funds for mentors and/or other induction activities (See Table 1). While this finding suggests that the INTASC portfolio states were more likely to provide funded induction support for beginning teachers, it is important to examine the quality of these programs. One important aspect of program quality is the amount of state funding for mentoring and other induction support activities. Among the INTASC portfolio states that required and funded induction programs, the amount of state funding per new teacher ranged from $300 in Connecticut ($200 per new teacher to districts and $100 per new teacher for BEST seminars) to $1,600 in Kentucky ($1400 for each mentor and $200 for each teacher educator involved in induction). Among the non-portfolio states with required, state-funded programs, most spent between $600 and $1,200 per new teacher. For example, West Virginia compensated mentors $600, North Carolina paid them $1,000 each, and Arkansas paid them $1,200 each. An exception was California, which allocated $3,600 for each new teacher and required local induction programs to match these funds with an additional $2,000. With $5,600 for each beginning teacher, many local induction programs used these funds to compensate mentors (known as support providers), offer release time to new teachers and mentors, provide workshops to beginning teachers, and compensate program administrators. A second aspect of program quality involves the nature of mentor training. Mentors may be more likely to help new teachers develop their instructional skills when they have participated in training which 1) addresses strategies for instructional coaching and promoting reflective inquiry, 2) is based on state teaching standards and helps mentors relate instructional practice to the standards, and 3) addresses strategies for using teacher performance assessments to promote new teacher development. In Connecticut, for example, mentors are required to participate in 24 hours of state-provided support teacher training which addresses strategies for promoting reflective practice, the state’s teaching standards, and the state’s portfolio requirements. Similarly, mentors in Kentucky and Louisiana must participate in training that addresses instructional coaching, their states’ teaching standards, and the assessment instruments used in their states. In contrast, while the other INTASC portfolio states (with required, state-funded induction programs) – Indiana and South Carolina - train mentors to use their states’ teaching standards in working with new teachers, they do not prepare them to use performance assessments in their work with new teachers. Among the non-portfolio states with required, state-funded programs, all nine train mentors to collaborate and reflect on practice and to draw on state teaching standards in their work with new teachers. Only four of the nine, though, prepare mentors to use performance assessments in providing support to beginning teachers. Arkansas and California both train mentors to use such assessments for formative purposes. In Arkansas, mentors are trained to use Pathwise, developed by ETS, to identify new teachers’ strengths and weaknesses. For this assessment, teachers complete a class profile and an instructional plan and are observed and interviewed by their mentor. Pathwise includes an observation form, on which mentors can write suggestions, and an instruction and reflection profile, on which teachers can reflect in writing on their practice. The use of Pathwise helps prepare new teachers for a summative assessment process involving Praxis III, described in the next section. Support providers in California are trained to use the California Formative Assessment and Support System for Teachers (CFASST), which is also described in the next section. Two other non-INTASC portfolio states, Oklahoma and North Carolina, employ performance assessments for making licensure decisions with new teachers. In Oklahoma, new teachers work with a residency team during their first year, which includes a mentor, an administrator, and a representative from the institution of higher education (IHE) where the teacher went through teacher preparation. Mentors, administrators, and IHE representatives use the Resident Teacher Observation Instrument to evaluate first-year teachers in the areas of teaching and assessment, classroom management, human relations, and professionalism. Mentors in North Carolina are trained to assist first-year teachers with issues related to curriculum, instruction, and classroom management; and to support second-year teachers as they complete a Performance-Based Product (PBP), which is described in the next section. In sum, the INTASC portfolio states were more likely to have funded statewide induction programs in 2001-02 than the other states. Within both groups, though, the quality of these programs varied with regard to the amount of state funding for mentoring and other induction activities, the nature of mentor training, and whether the programs involved the use of performance assessments. The next section examines the nature of such assessments and their use in making licensure decisions. III. State Policy Regarding Teacher Performance Assessment In the 1990s, INTASC, under the leadership of Connecticut, and ETS developed new approaches to the assessment of practicing teachers that are designed to better measure complex teaching performance than earlier teacher tests. These new approaches are performance-based and occur during the first three years of teaching. In particular, they feature such assessment strategies as teacher-developed portfolios, classroom observations, and structured interviews, and they are based on different conceptions of teaching than earlier assessments. After reviewing performance assessments for beginning teachers in use in the 1980s, this section reports on the number of states that currently use performance assessments in making professional licensure decisions and describes several assessments that are being employed in this way. Teacher Performance Assessments in the 1980s In the 1980s, in part due to dissatisfaction with standardized, multiple-choice tests, several states implemented performance assessments of novice teachers. In these states, candidates were granted an initial license after they completed their teacher education course requirements and student teaching and passed tests of basic skills, content knowledge, and/or knowledge of teaching. They did not qualify for a professional license, though, until they had attained a teaching position and passed an on-the-job performance assessment. Among the more prominent of these assessments were the Florida Performance Measurement System (FPMS), the Georgia Teacher Performance Assessment Instruments (TPAI), and the Texas Appraisal System (TAS). These assessments were viewed favorably by many educators, researchers, and policy makers because they were based on empirical research, particularly process/product research on teaching (Kuligowski, Holdzkom, & French, 1993; Pecheone & Stansbury, 1996). By drawing their assessment criteria from the same research base, Florida, Georgia, Texas, and other states were able to promote a common understanding of teaching across the region and to provide educators within and across states with a common language for discussing teaching practice (Kuligowski, Holdzkom, & French, 1993). Process/product research also influenced the data-collection methods used in these assessment systems. In some states, low-inference systems were used. The FPMS, for example, focused on the frequency of different teaching behaviors. Other states employed higher-inference or modified scripting procedures including Georgia and Texas, where assessors used five-point scales to measure the quality of teaching they observed (Pecheone & Stansbury, 1996). While the FPMS, TPAI, TAS, and other similar performance assessments were fairly popular, they were also criticized for reinforcing a narrow conception of teaching. By focusing on a uniform set of teaching behaviors and strategies, regardless of the content area or grade level being taught, these assessments may have led teachers to follow a fixed set of prescriptions. In the FPMS, for example, the “observers recorded the frequencies of specific behaviors in two columns – one for ‘effective’ behaviors, the other for ‘ineffective’” - without taking account of contextual factors (Darling-Hammond, Wise, & Klein, 1995, p.64). Consequently, in this and similar systems, teachers could have been discouraged “from adapting their instruction to the particular subjects and students they were teaching” (Floden & Klinzing, 1990). New Advances in Beginning Teacher Performance Assessments The newer assessments developed by Connecticut/INTASC, ETS, and several states are based on different conceptions of teaching than earlier assessments. As of 2001-02, five of the 12 states (41.7%) that piloted INTASC portfolios required candidates to pass assessments of classroom performance that involved evaluators from outside their schools (See Table 1). These states were Connecticut, Kentucky, Louisiana, New York, and South Carolina. Connecticut had developed and implemented content-specific portfolios in several subject areas In the early-1990s, Connecticut developed prototype content-specific portfolios for new teachers. Based on a grant with INTASC in the mid-1990s, Connecticut expanded portfolio development into almost all licensure areas. while Louisiana, Kentucky, and South Carolina were using classroom observations in making licensure decisions. New York’s performance assessment consisted of a 30-minute videotaped sample of the candidate teaching in his or her classroom. In addition, two other INTASC portfolio states were planning to implement performance assessments for use in making licensure decisions within the next two to three years. Indiana intended to use the INTASC portfolios with second-year teachers while Ohio was implementing Praxis III (described below) in 2002-03 with first-year teachers. In 2001-02, only four of the other 38 states (10.5%) - North Carolina, Arkansas, Oklahoma, and Washington - required candidates to pass a performance assessment involving evaluators from outside their schools. North Carolina required second year teachers to complete a Performance-Based Product, which was similar to but less extensive than the Connecticut portfolios, and Arkansas had implemented Praxis III with first-year teachers. For their part, Oklahoma and Washington employed classroom observations. Some of the performance assessments employed by INTASC portfolio states and other states are described in the rest of this section. In 2001-02, second-year teachers in Connecticut in ten content areas were required to earn passing scores on content-specific portfolios in order to earn a professional license. These areas included science, math, English/language arts, social studies, elementary education, special education, visual art, music, and physical education, and world languages. For each portfolio, teachers had to complete several entries that were integrated around one or two units of instruction. These entries included a description of their teaching context, a set of lesson plans, two videotapes of instruction during the unit(s), samples of student work, and written reflections on their planning, instruction, and assessment of student progress. As part of the portfolio, each teacher needed to focus on two students during the unit(s) and write about how they would modify their instruction and assessment practices to address the needs of these students. Each portfolio was scored on a 0-to-4 scale by classroom teachers who had been trained by the state as assessors. The assessors scored the portfolios in relation to a set of guiding questions based on content-specific state teaching standards. Those candidates who earned non-passing scores (0 or 1) had the opportunity to go through the portfolio process again during their third year of teaching. If their performance on the portfolio remained unsatisfactory, they would be ineligible for a professional license and would not able to continue teaching in Connecticut public schools. North Carolina used a similar performance assessment, the Performance-Based Product (PBP), for licensure decisions in 2001-02. For this assessment, teachers needed to include demographic and descriptive information about themselves, their school, and their teaching assignment; a unit and five lessons with relevant assessment information and examples of student work; an analysis of the academic work and achievement data of one student and how the teacher addressed this student’s needs during the unit; written reflections on each component of the PBP; and an edited videotape of classroom instruction. The PBPs were evaluated on a 2-to-4 scale by teachers and administrators who had been trained as assessors in relation to more than 50 indicators in three areas - instructional practice, unique learner needs, and classroom climate. A “2” was below standard, a “3” was at standard, and a “4” was above standard. Each PBP was scored independently by two assessors, at least one of whom was licensed in the area in which licensure was sought and at least one of whom was a classroom teacher. To meet the standard for recommendation for licensure, the teacher’s composite score had to total 318 points. Those teachers whose scores were below 318 points would have two opportunities in their third year to improve those parts of the PBP (instructional practice, unique learner needs, and/or classroom climate) that were below standard. If the PBP did not meet standard (at least 318 points) by the end of their third year, the teacher would not be recommended for professional licensure. Another non-INTASC portfolio state, Arkansas, used Praxis III with first-year teachers in 2001-02 in making licensure decisions. For Praxis III, each teacher completes a class profile and an instruction profile and is then interviewed and observed by a trained assessor. In the pre-observation interview, teachers are asked several questions about their goals, instructional methods, activities, and materials, and how they acquire knowledge about their students’ prior knowledge and skills. During the observation, the assessor records key aspects of what the teacher and students say and do that are related to the 19 criteria that underlie Praxis III. These notes, taken by the assessor during the observation, are to be objective and descriptive; at this point, the assessor is not forming judgments. In the post-observation interview, the teacher is asked to reflect on how the lesson went and whether they would do anything differently if they were to teach the lesson to the same class again. After the post-observation interview is completed, the assessor reviews all of the notes taken during the observation and interviews along with the information from the class profile and instruction profile. The assessor then determines what evidence, positive or negative, exists for each of the 19 criteria, selects the most salient evidence of performance for each, and transfers it to the Record-of-Evidence form. Finally, the assessor writes a summary statement for each criterion, which links the evidence to the scoring rules for that criterion, and assigns a score from 1.0 to 3.5 for each criterion. All of the activities associated with the assessment of a single lesson comprise an assessment cycle. Arkansas and other states that use Praxis III in making licensure decisions are required by ETS to administer the assessment at least twice during each candidate’s first year of teaching. Four INTASC portfolio states employed classroom observations in 2001-02 in making professional licensure decisions. In Kentucky, for example, all first-year teachers (interns) were required to participate in the Kentucky Teacher Internship Program (KTIP). Over the course of the school year, interns would receive assistance from a committee that included a mentor, the principal, and a teacher educator. Committee members were trained by the state to use the KTIP observation instrument and they were expected to observe the intern three times over the course of the year. In their observations, committee members assigned individual scores for the teacher’s performance in relation to indicators associated with each of the state teaching standards as well as a holistic score for their performance in relation to each standard. At the end of the year, the committee was to determine whether the intern’s teaching had been satisfactory – and, therefore, qualified them for a professional license - based on their performance on the instrument and response to feedback throughout the year. Similarly, first-year teachers in Louisiana were evaluated in 2001-02 during their second semester by an assessment team consisting of the principal and an assessor from outside their school. The principal and external assessor were expected to conduct structured interviews, classroom observations, and post-observation conferences, and then make a recommendation to the state regarding licensure. The approaches to new teacher assessment in Kentucky and Louisiana were similar to Praxis III in that they involved classroom observations as opposed to videotaped lessons, samples of student work, and written reflections on practice. In contrast to Praxis III, though, both states employed assessment teams that included the beginning teacher’s principal. In summary, these findings suggest that the INTASC portfolio states were more likely than other states to employ performance assessments in making licensure decisions in 2001-02, although the difference was not statistically significant. The findings also indicate that states in both groups employ a variety of different types of assessments. The next section considers some of the differences among various approaches to performance assessment and their potential consequences. IV. Differences Among Performance Assessments There are several important differences between content-specific portfolio assessments and those systems that feature observations and interviews. One important difference is that Praxis III and other similar systems involve live observations while the Connecticut portfolios and PBPs in North Carolina require teachers to submit videotapes of their instruction. Praxis III and other observation-based systems enable assessors to see what the teacher and all of the students are doing and to evaluate the teacher’s management skills with regard to the entire class. In contrast, by requiring teachers to include videotapes of classroom instruction, the Connecticut and North Carolina assessments are less likely to provide information about all that is happening, including teachers’ management skills, with regard to an entire class. Further, the quality of the video may vary from teacher to teacher in ways unrelated to teaching quality. At the same time, the use of video allows for the use of multiple assessors who can review a teacher’s instruction at their own pace (Porter, Youngs, & Odden, 2001). A second noteworthy difference is that the Connecticut and North Carolina assessments are content-specific assessments and scorers who evaluate them must have teaching experience in the same content area and at the same level of schooling as those they are assessing. In contrast, Praxis III and other observation systems are designed for use in all subject areas and at all developmental levels and assessors do not necessarily have teaching experience in the same subject areas as those they assess. As a result, one would expect the Connecticut and North Carolina assessments to more fully reveal whether beginning teachers practice in ways that reflect pedagogical content knowledge; i.e., the most appropriate ways of presenting subject matter to students, their most common misconceptions, and areas they find the most difficult. It is important for teachers to be able to construct and apply pedagogical content knowledge because theoretical and empirical research suggests that it is associated with effective teaching (Bransford, Brown, & Cocking, 1999; Darling-Hammond, 1998; Fennema et al., 1996; Grossman, 1990; Resnick, 1987; Shulman, 1987). Even when observation systems are supplemented with paper-and-pencil tests of pedagogical content knowledge, content-specific performance assessments more fully reveal whether new teachers’ practices reflect such knowledge (Porter, Youngs, & Odden, 2001). This is due to the fact that the former are based on constructed response questions not analyses of actual teaching practices. It may be, though, that paper-and-pencil tests of pedagogical content knowledge provide a more complete assessment of teacher knowledge in this important area. A third major difference is that the Connecticut and North Carolina assessment exercises are integrated around one or two instructional units while the multiple observations that occur in Praxis III and other observation systems are not necessarily part of the same unit. By requiring teachers to complete a series of lesson plans for a unit of instruction and write a commentary about them, the content-specific portfolios are able to assess teachers’ abilities to plan multiple tasks across several lessons within the same unit and multiple means of assessing students. In contrast, observation-based systems provide information related to individual lessons and single assessments (Porter, Youngs, & Odden, 2001). V. Challenges to Performance-Based Licensure Despite the promise of performance assessments to promote more effective teaching and improved student achievement, states face several challenges in implementing such assessments for use in professional licensure decisions. These challenges include costs, the need to provide evidence of validity and reliability and ensure fairness, and pressure to lower standards for entry into the profession due to teacher shortages. In response to these issues, some of the states that piloted the INTASC portfolios have elected not to use them in making licensure decisions. This section discusses these issues and then describes two states, California and Wisconsin, that use performance assessments for other purposes. Costs One challenge in implementing performance assessments in making licensure decisions has to do with their costs. The primary costs include developing or purchasing an assessment system, pilot testing and validating the system, training mentor teachers, training assessors, administering proficiency tests for assessor candidates, and administering and scoring the assessment. It would cost states much more to develop and implement content-specific teaching standards and portfolios (featuring lesson plans, videotapes of instruction, samples of student work, and written reflections) than to purchase and implement Praxis III or to develop and implement their own observation instruments. At the same time, the costs of implementing standards-based portfolios could be lowered by adapting the INTASC standards and implementing the Connecticut portfolios. The costs of training mentors and assessors, administering assessor proficiency tests, and evaluating assessments depend primarily on how much mentors and assessors are paid to participate in these activities. Therefore, states can lower these costs by reducing what they pay to mentors and/or assessors. However, reducing or eliminating stipends for mentors or assessors could reduce the incentive for experienced teachers to serve in these roles, which could, in turn, affect the quality of mentoring and assessment. Validity, Reliability, and Fairness Another challenge involves providing evidence of validity and reliability and ensuring fairness. With regard to validity, developers and users of performance assessments must ensure evidence based on test content, criterion-related evidence, evidence based on response processes, and evidence based on the consequences of testing. In terms of reliability, there needs to be evidence of consistency of scores across raters or rater pairs; consistency of scores across lessons, exercises, or portfolios; and the precision of classification decisions. Finally, with regard to fairness, assessment developers and users must consider whether the use of performance assessments results in bias or adverse impact (Klein, 1998; Messick, 1989; Moss, 1998a; Porter, Youngs, & Odden, 2001). In developing Praxis III, ETS conducted a series of studies that provided content-related evidence of validity (see Dwyer, 1994), validity evidence based on response processes, and evidence of inter-rater reliability (Livingston, 1993). Similarly, in developing its portfolios, INTASC has documented content-related evidence of validity and validity evidence based on response processes, and undertaken studies of inter-rater pair reliability as well as criterion studies of the relationship between teachers’ performance on portfolios and variables external to the assessment (Moss, 1998b). States that pilot and implement performance assessments for use in making licensure decisions must supplement such studies with research on the consequences of their use, particularly whether they result in bias or adverse impact. The potential for adverse impact on racial minorities is of particular concern given the fact that several researchers found that previous licensure tests had an adverse impact on such minorities (Goertz & Pitcher, 1985; Graham, 1987; Smith, Miller, & Joy, 1988). The most recent court decision regarding licensure testing indicates that states can ensure the legal defensibility of licensure tests by providing evidence related to validity and reliability (Association of Mexican-American Educators v. California, 1999; also see Melnick & Pullin 2000). At the same time, because performance assessments are significantly different than paper-and-pencil tests, they raise new challenges in test use. Consequently, states should ensure fairness by conducting process-oriented studies of candidates’ opportunity to learn and access to support (e.g., Collins, Schutz, & Moss, 1997). Teacher Supply and Demand A third challenge to using performance assessments for licensure decisions involves perceived teacher shortages in some states, particularly in content areas such as mathematics, science, and bilingual education. Some educators, researchers, and policy makers oppose the use of such assessments in licensing because they believe they will create shortages of teachers or exacerbate existing ones. This is a legitimate concern in the first decade of the 21st century given increasing student enrollments, high numbers of teacher retirements, class-size reduction laws in many states, and high levels of attrition and migration among new teachers. At the same time, we would like to offer a different view. The use of performance assessments in licensing decisions would be unlikely to increase teacher shortages if they are implemented under the following three conditions. First, new teachers must receive structured support from mentors or support teams who are trained and compensated, particularly as they go through the assessment process. Second, states must take steps to increase teacher salaries and make them equitable across districts. These two steps have the potential to reduce teacher attrition and migration and promote more effective instructional practices. But they should be accompanied by state laws that restrict or ban out-of-field teaching; i.e., when teachers are assigned to subjects which they are not trained nor licensed to teach (Ingersoll, 1999). In Connecticut, low levels of attrition in the first years of teaching seem to be a function of high and equitable salaries as well as extensive induction support for beginning teachers. In states with lower, less equitable salaries and less comprehensive support for new teachers, performance assessments may contribute to teacher shortages. Other Uses of Performance Assessments In response to these challenges, some states have elected to use teacher performance assessments in other ways. In California, for example, CFASST is currently being used with almost all beginning teachers in the state. There has been a tremendous demand for new teachers in the state for the past several years due to increasing enrollments, teacher retirements, and class-size reduction legislation. The high levels of demand for teachers has made it more sensible for the state to use performance assessments in identifying new teachers’ professional strengths and needs rather than in making licensure decisions. With the recent statewide expansion of its induction program for new teachers, California is now in a position to use performance assessments for licensure although the high demand for new teachers persists. Instead of using performance assessments with practicing teachers, other states have chosen to require institutions of higher education to incorporate them into teacher preparation programs. By 2003-04 in Wisconsin, for example, all colleges and universities that train teachers must adopt the INTASC core standards or develop their own teacher education standards and have in place performance assessments that correspond to the standards. V. Summary and Conclusion This paper examined state policy related to initial licensure, induction, and professional licensure in all 50 states to determine whether differences obtained in these areas in 2001-02 between the 12 states that piloted the INTASC portfolios between 1996 and 1999 and the other 38 states. We found that the INTASC portfolio states were more likely to test new teachers’ knowledge of subject matter and pedagogy, fund statewide induction programs, and employ performance assessments for use in licensure decisions. The paper noted several challenges to the use of teacher tests and performance assessments, particularly those related to test validity, reliability, and fairness, and described variations in the quality of state induction programs as well as differences between teacher portfolios and observation-based assessments. There is little research from individual states on the consequences for student achievement of adopting these state policies, but a growing body of research indicates that teachers’ performances on tests of literacy skills and their knowledge of content and pedagogy are related to student achievement. While tests of literacy skills, content knowledge, and pedagogical knowledge reveal important information, we believe they are incomplete measures of beginning teachers’ abilities. In order to promote higher levels of teacher quality, states should strongly consider implementing statewide, funded induction programs and requiring new teachers to pass performance assessments. Such assessments, particularly those used in Connecticut and North Carolina, represent important advances over other tests because they measure and promote new teachers’ ability to integrate knowledge of content, students, and context in planning instruction, analyzing student work, and reflecting on practice. References Association of Mexican-American Educators v. California, 937 F. Supp. 1397 (N.C. Cal. 1996), aff’d, 183 F.3d 1055 (9th Cir. 1999). Bransford, J.D., Brown, A.L., & Cocking, R.R. (Eds.). (1999). How people learn: Brain, mind, experience, and school. Committee on Developments in the Science of Learning, Commission on Behavioral and Social Sciences and Education, National Research Council. Washington, DC: National Academy Press. Coleman, J.S., Campbell, E.Q., Hobson, C.J., McPartland, J., Mood, A.M., Weinfeld, F.D., & York, R.L. (1966). Equality of educational opportunity. Washington, DC: U.S. Government Printing Office. Collins, K.M., Shcutz, A.M., & Moss, P.A. (1997). INTASC candidate interviews: A summary report – Draft. Washington, DC: Interstate New Teacher Assessment and Support Consortium. Darling-Hammond, L. (1998). Teachers and teaching: Testing policy hypotheses from a National Commission report. Educational Researcher, 27(1), 5-15. Darling-Hammond, L., Berry, B., & Thoreson, A. (2001). Does teacher certification matter” Evaluating the evidence. Educational Evaluation and Policy Analysis, 23(1), 57-77. Darling-Hammond, L., Wise, A.E., & Klein, S.P. (1995). A license to teach: Building a profession for 21st-century schools. Boulder, CO: Westview Press. Dwyer, C.A. (1994). Development of the knowledge base for the Praxis III classroom performance assessments assessment criteria. Princeton, NJ: Educational Testing Service. Ehrenberg, R.G. & Brewer, D.J. (1995). Did teachers’ verbal ability and race matter in the 1960s? Coleman Revisited. Economics of Education Review, 14(1), 1-21. Fennema, E., Carpenter, T.P., Franke, M.L., Levi, L., Jacobs, V.R., & Empson, S.B. (1996). A longitudinal study of learning to use children’s thinking in mathematics instruction. Journal for Research in Mathematics Education, 27(4), 403-434. Ferguson, R.F. (1991). Paying for public education: New evidence on how and why money matters. Harvard Journal on Legislation, 28(2): 465-498. Floden, R.E. & Klinzing, H.G. (1990). What can research on teacher thinking contribute to teacher preparation? A second opinion. Educational Researcher, 19 (5), 15-20. Goertz, M.E. & Pitcher, B. (1985). The impact of NTE use by states on teacher selection. Princeton, NJ: Educational Testing Service. Goldhaber, D.D. & Brewer, D.J. (1997). Evaluating the effect of teacher degree level on educational performance. In W.J. Fowler (Ed.), Developments in School Finance, 1996. (pp.197-210). Washington, DC: National Center for Education Statistics, U.S. Department of Education. Goldhaber, D.D. & Brewer, D.J. (2000). Does teacher certification matter? High school teacher certification status and student achievement. Educational Evaluation and Policy Analysis, 22(2), 129-146. Graham, P.A. (1987). Black teachers: A drastically scarce resource. Phi Delta Kappan, 68(8), 598-605. Grissmer, D.W. & Kirby, S.N. (1987). Teacher attrition: The uphill climb to staff the nation's schools. Santa Monica, CA: RAND Corporation. Grossman, P.L. (1990). The making of a teacher: Teacher knowledge and teacher education. New York: Teachers College Press. Haertel, E.H. (1991). New forms of teacher assessment. In G. Grant (Ed.), Review of Research in Education, 17, 3-29. Haney, W., Madaus, G., & Kreitzer, A. (1987). Charms talismanic: Testing teachers for the improvement of American education. In E.Z. Rothkopf (Ed.), Review of Research in Education, 14, 169-238. Hanushek,E.A. (1992). The trade-off between child quantity and quality. Journal of Political Economy, 100(1), 84-117. Ingersoll, R.M. (1999). Teacher turnover, teacher shortages, and the organization of schools. A CTP Working Paper. Seattle, WA: Center for the Study of Teaching and Policy. Klein, S.P. (1998). Standards for teacher tests. Journal of Personnel Evaluation in Education, 12(2), 123-138. Kuligowksi, B., Holdzkom, D., & French, R. (1993). Teacher performance evaluation in the southeastern states: Forms and functions. Journal of Personnel Evaluation in Education, 1, 335-358. Livingston, S. (1993). Inter-assessor consistency of the Praxis III classroom performance assessment: Spring 1992 preliminary version. Unpublished report. Princeton, NJ: Educational Testing Service. Melnick, S. & Pullin, D. (2000). Can you take dictation? Prescribing teacher quality through testing. Journal of Teacher Education, 51(4), 262-275. Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18(2), 5-11. Monk, D. H. & King, J.,A. (1994). Multilevel teacher resource effects in pupil performance in secondary mathematics and science: The case of teacher subject matter preparation. In R.G. Ehrenberg (Ed.), Choices and consequences: Contemporary policy issues in education. (pp. 29-58). Ithaca, NY: ILR Press. Moss, P.A. (1998a). Rethinking validity in the assessment of teaching. In N. Lyons & G. Grant (Eds.), With portfolio in hand: Portfolios in teaching and teacher education (pp.202-219). New York: Teachers College Press. Moss, P.A. (1998b). Response to Porter, Odden, & Youngs. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA. Murnane, R.J., Singer, J.D., Willett, J.B., Kemple, J.J., & Olsen, R.J. (1991). Who will teach? Policies that matter. Cambridge, MA: Harvard University Press. National Center for Education Statistics. (1997). Characteristics of stayers, movers, and leavers: Results from the Teacher Follow-up Survey, 1994-95. Washington, DC: U.S. Department of Education. Pecheone, R. L. & Stansbury, K. (1996). Connecting teacher assessment and school reform. The Elementary School Journal, 97(2), 163-177. Porter, A.C., Youngs, P., & Odden, A. (2001). Advances in teacher assessment and their uses. In V. Richardson (Ed.), Handbook of Research on Teaching (4th ed., pp.259-297). Resnick, L.B. (1987). Education and learning to think. Washington, DC: National Academy Press. Rowan, B., Chiang, F., & Miller, R.J. (1997). Using research on employees’ performance to study the effects of teachers on students’ achievement. Sociology of Education, 70(4), 256-284. Shulman, L.S. (1987). Knowledge and teaching: Foundations of the new reform. Harvard Educational Review, 57(1), 1-22. Smith, I.L., Miller, M.C., & Joy, J. (1988). A case study of the impact of performance-based testing on the supply of minority teachers. Journal of Teacher Education, 39(4), 45-53. Wayne, A.J. & Youngs, P. (under review). Teacher characteristics and student achievement gains: A review. Review of Educational Research. Biographical Statements Peter Youngs is a research associate with the Wisconsin Center for Education Research. He is a doctoral student in the Department of Educational Policy Studies at the University of Wisconsin-Madison. His research interests include state and district policy related to teacher licensure, induction, professional development, and school reform. Allan Odden is professor of educational administration at the University of Wisconsin-Madison and co-director of the Consortium for Policy Research in Education (CPRE). He is an international expert on teacher compensation, education finance, and educational policy. He is currently directing research projects on school finance redesign, resource allocation in schools, the costs of instructional improvement, and teacher compensation. He is a past president of the American Educational Finance Association and received AEFA’s distinguished service award in 1998. Andrew Porter is professor of educational psychology and director of the Wisconsin Center for Education Research at the University of Wisconsin-Madison. He has published widely on psychometrics, student assessment, education indicators, and research on teaching. His current work focuses on curriculum policies and their effects on opportunity to learn. He is an elected member and officer of the National Academy of Education and president of the American Educational Research Association. Endnotes 33