Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

The Classroom Observation Protocol For Undergraduate STEM (COPUS) : A New Instrument To Characterize University STEM Classroom Practices

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

CBE—Life Sciences Education

Vol. 12, 618–627, Winter 2013

Article

The Classroom Observation Protocol for Undergraduate


STEM (COPUS): A New Instrument to Characterize
University STEM Classroom Practices
Michelle K. Smith,* Francis H. M. Jones,† Sarah L. Gilbert,‡ and Carl E. Wieman‡
*School of Biology and Ecology and Maine Center for Research in STEM Education, University of Maine–Orono,
Orono, ME 04469-5751; † Department of Earth, Ocean, and Atmospheric Sciences, University of British Columbia,
Vancouver, BC V6T 1Z4, Canada; ‡ Carl Wieman Science Education Initiative, University of British Columbia,
Vancouver, BC V6T 1Z3, Canada

Submitted August 10, 2013; Revised September 8, 2013; Accepted September 9, 2013
Monitoring Editor: Erin L. Dolan

Instructors and the teaching practices they employ play a critical role in improving student learning
in college science, technology, engineering, and mathematics (STEM) courses. Consequently, there
is increasing interest in collecting information on the range and frequency of teaching practices
at department-wide and institution-wide scales. To help facilitate this process, we present a new
classroom observation protocol known as the Classroom Observation Protocol for Undergraduate
STEM or COPUS. This protocol allows STEM faculty, after a short 1.5-hour training period, to
reliably characterize how faculty and students are spending their time in the classroom. We present
the protocol, discuss how it differs from existing classroom observation protocols, and describe the
process by which it was developed and validated. We also discuss how the observation data can be
used to guide individual and institutional change.

INTRODUCTION report (AAAS, 2010), and the National Research Council


Discipline-Based Education Research report (Singer et al., 2012).
A large and growing body of research indicates that un- Given these compelling, evidence-based recommendations
dergraduate students learn more in courses that use active- and the recognized need for measures of teaching effective-
engagement instructional approaches (Prince, 2004; Knight ness beyond student evaluations (Association of American
and Wood, 2005; Michael, 2006; Blanchard et al., 2010). As a re- Universities, 2011), higher education institutions are strug-
sult, the importance of teaching science, technology, engineer- gling to determine the extent to which faculty members are
ing, and mathematics (STEM) courses more effectively has teaching in an interactive manner. This lack of information
been stressed in numerous reports, including the President’s is a major barrier to transforming instruction and evaluating
Council of Advisors on Science and Technology Engage to Ex- the success of programs that support such change.
cel report (2012), the National Science Foundation/American To collect information about the nature of STEM teaching
Association for the Advancement of Science Vision and Change practices as a means to support institutional change, faculty
at both the University of British Columbia (UBC) and the
University of Maine (UMaine) created classroom observation
DOI: 10.1187/cbe.13-08-0154 programs. The results of such observations were needed to:
Address correspondence to: Michelle K. Smith (michelle.k.smith@
maine.edu).
1) characterize the general state of STEM classroom teaching
at both institutions, 2) provide feedback to instructors who
c 2013 M. K. Smith et al. CBE—Life Sciences Education  c 2013 desired information about how they and their students were
The American Society for Cell Biology. This article is distributed
spending time in class, 3) identify faculty professional de-
by The American Society for Cell Biology under license from
the author(s). It is available to the public under an Attribution– velopment needs, and 4) check the accuracy of the faculty
Noncommercial–Share Alike 3.0 Unported Creative Commons Li- reporting on the Teaching Practices Survey that is now in use
cense (http://creativecommons.org/licenses/by-nc-sa/3.0). at UBC (CWSEI Teaching Practices Survey, 2013).
“ASCB R
” and “The American Society for Cell Biology
R
” are regis- To achieve these goals, the programs needed an observa-
tered trademarks of The American Society for Cell Biology. tion protocol that could be used by faculty member observers

618
Supplemental Material can be found at:
http://www.lifescied.org/content/suppl/2013/11/14/12.4.618.DC1.html

Downloaded from http://www.lifescied.org/ by guest on March 13, 2015


Observation Protocol for STEM Classes

to reliably characterize how students and instructors were and observers make a checkmark when any of the behaviors
spending their time in undergraduate STEM classrooms. A occur.
critical requirement of the protocol was that observers who The TDOP instrument avoids the judgment issues associ-
were typical STEM faculty members could achieve those re- ated with the RTOP, but it still requires substantial training,
sults with only 1 or 2 hours of training, as it is unrealistic to as one might expect for a protocol that was designed to be
expect they would have more time than that available. In the a complex research instrument. Preliminary work suggests
quest for a suitable observation protocol, multiple existing that, after a 3-day training session, observers have acceptable
options were considered, and ultimately rejected. IRR scores when using the TDOP (Hora et al., 2013). Observers
The observation protocols considered were divided into at our institutions tried using this instrument, but without the
two categories: open-ended or structured. When observers full training, they found it difficult to use the TDOP in a re-
use open-ended protocols, they typically attend class, make liable way, due to the complexity of the items being coded
notes, and respond to such statements as: “Comment on stu- and the large number of possible behavior codes. We also
dent involvement and interaction with the instructor” (Millis, found that the particular research questions it was designed
1992). Although responses to these types of questions can pro- to address did not entirely align with our needs. For exam-
vide useful feedback to observers and instructors, the data ple, it covers some aspects that are not necessary for faculty
are observer dependent and cannot easily be standardized or observation programs, such as whether an instructor uses in-
compared across multiple classrooms (e.g., all STEM courses structional artifacts (e.g., a laser pointer or computer; Hora
at UBC or UMaine). et al., 2013) and fails to capture others that are needed, such
Alternatively, structured protocols provide a common set as whether an instructor encourages peer discussion along
of statements or codes to which the observers respond. Often, with clicker questions (Mazur, 1997; Smith et al., 2009, 2011).
these protocols ask observers to make judgments about how We also wanted to better characterize the student behaviors
well the teaching conforms to a specific standard. Examples of during the class period than the TDOP easily allowed.
such protocols include the Inside the Classroom: Observation Out of necessity, we created a new protocol called the Class-
and Analytic Protocol (Weiss et al., 2003) and the Reformed room Observation Protocol for Undergraduate STEM, or
Teaching Observation Protocol (RTOP; Sawada et al., 2002). COPUS. Like the TDOP, this new protocol documents class-
These protocols consist of statements that observers typically room behaviors in 2-min intervals throughout the duration
score on a Likert scale from “not at all” to “to a great extent” of the class session, does not require observers to make judg-
and contain such statements as: “The teacher had a solid ments of teaching quality, and produces clear graphical re-
grasp of the subject matter content inherent in the lesson” sults. However, COPUS is different in that it is limited to 25
(from RTOP; Sawada et al., 2002). codes in only two categories (“What the students are doing”
The RTOP in particular has been used to observe uni- and “What the instructor is doing”) and can be reliably used
versity STEM instruction. For example, it has been used to by university faculty with only 1.5 hours of training (Figure
evaluate university-level courses at several different insti- 1 has a description of the codes; the Supplemental Material
tutions to measure the effectiveness of faculty professional includes the full protocol and coding sheet). Observers who
development workshops (Ebert-May et al., 2011) and to com- range from STEM faculty members without a background
pare physics instructors in a study examining coteaching as a in science education research to K–12 STEM teachers have
method to help new faculty develop learner-centered teach- reliably used this protocol to document instruction in under-
ing practices (Henderson et al., 2011). The RTOP is also being graduate science, math, and engineering classrooms. Taken
used to characterize classroom practices in many institutions together, their results show the broad usability of COPUS.
and in all levels of geoscience classes (Classroom Observation
Project, 2011).
The RTOP was found to be unsuitable for the UBC and DEVELOPMENT
UMaine programs for two main reasons. The first is that the
protocol involves many observational judgments that can be The development of COPUS was an evolutionary process
awkward to share with the instructor and/or the larger uni- extending across more than 2 years, involving many itera-
versity community. The second is that observers must com- tions and extensive testing. It began at UBC, where science
plete a multiday training program to achieve acceptable in- education specialists (SESs) who were working with science
terrater reliability (IRR; Sawada et al., 2002). faculty on improving teaching (Wieman et al., 2010) wanted
More recently, new observation protocols have been devel- to characterize what both the students and instructors were
oped that describe instructional practices without any judg- doing during class. The SESs began testing various existing
ment as to whether or not the practices are effective or aligned protocols, including the TDOP, in different classes at UBC in
with specific pedagogic strategies. These observation proto- late 2011 and early 2012. The original TDOP did not meet
cols use a series of codes to characterize instructor and/or our needs (as described above), so we iteratively modified
student behaviors in the classroom; observers indicate how the protocol through nine different versions. These changes
often each behavior occurs during a class period (Hora et al., resulted in a format, procedure, data structure, and coding
2013; West et al., 2013). One observation protocol in particu- strategy that was easy to implement on paper or electron-
lar, the Teaching Dimensions Observation Protocol (TDOP), ically and convenient for analysis and display. The overall
was expressly developed to observe postsecondary nonlabo- format of the observation protocol remained largely stable,
ratory courses. For this protocol, observers document class- but the categories and codes continued to evolve.
room behaviors in 2-min intervals throughout the duration During the Fall term of 2012, 16 SESs, who are highly
of the class session (Hora et al., 2013). The possible class- trained and experienced classroom observers, used this
room behaviors are described in 46 codes in six categories, evolving protocol to observe a variety of courses in singles,

Vol. 12, Winter 2013 619

Downloaded from http://www.lifescied.org/ by guest on March 13, 2015


M. K. Smith et al.

Figure 1. Descriptions of the COPUS student and


instructor codes.

pairs, or trios across most of the departments in the UBC Fac- they use several criteria beyond the Bloom’s level, includ-
ulty of Science (including the disciplines of biology, computer ing: question difficulty, time required to answer the ques-
science, earth sciences, mathematics, physics, and statistics). tions, whether students are using a new or well-practiced
We analyzed the SES generated observation data to identify approach, and whether the questions have multiple reason-
coding disagreements and met with the SESs to discuss the able solutions (Lemons and Lemons, 2012).
evolving protocol and coding. These discussions covered ob- The second substantial change during this time was
served behaviors they found difficult to code and/or hard to changing another category—coding the level of student
interpret, and other important elements of instructor or stu- engagement—from required to optional. Having a measure
dent behavior they felt were not being adequately captured. of student engagement is useful for providing feedback to
The protocol evolved through five different versions during the instructor and for judging the overall effectiveness of
this stage of testing and feedback. The final version had sub- many instructional activities. With the coding of the levels of
stantially simplified categories and all identified problems engagement simplified to only discriminating between low
with the wording on the codes had been eliminated. Notably, (0–20% of the students engaged), medium, or high (≥80% of
it was quite simple to reliably code classes taught with tradi- the student engaged), some observers, particularly those who
tional lectures, as a very small number of behaviors need to be had some experience with observing levels of student engage-
coded. Therefore, the majority of the work went into improv- ment, could easily code engagement along with the other two
ing the protocol so it could reliably characterize classes that categories, and there was reasonable consistency between ob-
had substantial and varied interactions between instructor servers. However, less-experienced observers found it quite
and students and multiple student activities. hard to simultaneously code what the students were doing,
One substantial change during Fall 2012 was eliminating what the instructor was doing, and the student engagement
a category for judging the cognitive level of the activities. level. Also, there were difficulties with obtaining consistent
Observers had been asked to code the level of cognitive so- coding of student engagement across all observers; the judg-
phistication of current classroom activities, based on Bloom’s ments were often dependent on the levels of engagement
taxonomy of educational objectives (Bloom et al., 1956). After common to the specific disciplines and courses with which
multiple unsuccessful attempts to find a simple and reliable the observers were familiar. For this reason, the student en-
coding scheme that could capture this aspect of the classroom gagement category was made optional. We recommend ob-
activities, we dropped this category. Our decision to drop this servers do not try to code it until after they have become
category is supported by recent work showing that, when experienced at coding the “What the students are doing” and
faculty members write and evaluate higher-order questions, “What the instructor is doing” categories.

620 CBE—Life Sciences Education

Downloaded from http://www.lifescied.org/ by guest on March 13, 2015


Observation Protocol for STEM Classes

Another recurring theme of the discussions with the SESs was changed to “Listening to instructor/taking notes, etc.”
was the extent to which classroom observations could accu- The code was clarified, so observers knew they should se-
rately capture the quality of instruction or the efficacy of stu- lect this code only when the students were listening to their
dent work. In the end, after SESs observed different classes instructor, not when students were listening to their peers.
across many disciplines, there was a consensus that accu- Also, new codes were added to capture behaviors the teachers
rately evaluating the quality of instruction and the efficacy thought were missing, such as the instructor code “AnQ: Lis-
of student work was generally not possible. These highly tening to and answering student questions with entire class
trained and experienced observers concluded that these eval- listening.”
uations require a high degree of training of the observer in the The coding patterns of the two teacher observers in the
material and the pedagogic strategies, as well as familiarity same classroom were also compared to determine which spe-
with the student population (prior knowledge, typical class- cific codes were difficult to use consistently. An example com-
room behaviors, etc.). We concluded that quality judgments paring two teachers employing the student code “Ind” is
of this type were not realistic goals for limited classroom shown in Figure 2. Figure 2A compares how two observers
observations carried out by STEM faculty members. Thus, marked this code in the first iteration of testing, when it
the present version of COPUS captures the actions of both was described “Ind: Individual thinking/problem solving
instructors and students, but does not attempt to judge the in response to assigned task.” Observer 2 marked this code
quality of those actions for enhancing learning. throughout most of the class, and observer 1 marked this
After the completion of this development work at UBC, the code intermittently. Follow-up conversations with observer 2
COPUS was further tested by 16 K–12 teachers participating and other teachers indicated that some observers were mark-
in a teacher professional development program at UMaine. ing this code throughout the duration of the class, because
The teachers used the COPUS to observe 16 undergraduate they assumed individual students were thinking while they
STEM courses in five different departments (biology, engi- were taking notes, working on questions, and so on, but
neering, math, chemistry, and physics). While the teachers other observers were not. Therefore, we clarified the code to
easily interpreted many of the codes, they found a few to be be: “Ind: Individual thinking/problem solving. Only mark
difficult and suggested additional changes. For example, the when an instructor explicitly asks students to think about a
student code “Listening: paying attention/taking notes, etc.” clicker question or another question/problem on their own.”

Figure 2. A comparison of how two observers coded the student code “Ind.” (A) When the code was described as “Ind: Individual
thinking/problem solving in response to assigned task,” observer 2 marked this code more often than observer 1 did. (B) Coding after
description of the code was revised.

Vol. 12, Winter 2013 621

Downloaded from http://www.lifescied.org/ by guest on March 13, 2015


M. K. Smith et al.

Table 1. Information on the courses observed using the final version of the COPUS

Number of classes Number of different Percentage of courses at the Percentage of classes


Institution observed STEM departments introductory levela with >100 students

UBC 8 4b 100 63
UMaine 23 7c 96 35
a STEM courses at the first- and second-year levels.
b Biology, chemistry, math, and physics.
c Biology, molecular biology, engineering, chemistry, math, physics, and geology.

Figure 2B shows a comparison of the same observer pair, with We summarize the training steps in the following paragraphs,
the revised “Ind” code showing how the paired codes were and we have also included a step-by-step facilitator guide in
now closely aligned. the Supplemental Material.
In addition, the teacher observation data revealed a more The first step in the training process is to have the ob-
general problem: there was a lower degree of consistency in servers become familiar with the codes. At UBC, facilitators
coding student behaviors than in coding instructor behav- displayed the student and instructor codes (Figure 1) and
iors, and the teachers used a very limited set of codes for the discussed with the observers what each behavior typically
student behaviors. The earlier coding by the SESs had shown looks like in the classroom. At UMaine, the teacher observers
similar, but less dramatic, trends. We realized that this prob- played charades. Each teacher randomly selected a code de-
lem was due to a natural tendency of observers to focus on scription from a hat and silently acted out the behavior. The
the instructor, combined with the fact the instructor-related remaining observers had the code descriptions in front of
codes came first on the survey form. Therefore, the proto- them and guessed the code. The remainder of the training
col was changed, with the student codes viewed first, and was the same for both groups, with a total training duration
we emphasized coding student behaviors during subsequent of 2 hours for the K–12 teachers and 1.5 hours for the UBC
training sessions (see further details below in the Training faculty members.
section). As shown below, these changes appear to have fixed Second, observers were given paper versions of the cod-
this problem. ing sheet and practiced coding a 2-min segment of a class-
These further revisions culminated in a final version of the room video. An excerpt from the coding sheet is shown in
COPUS. This version was tested by having the same 16 K–12 Figure 3, and the complete coding sheet is included in the
teachers use it to observe 23 UMaine STEM classes, and by Supplemental Material. Observers often mark more than one
having seven STEM faculty observers use it to observe eight code within a single 2-min interval. The first video we used
UBC classrooms in pairs after 1.5 hours of training. Informa- showed an instructor making administrative announcements
tion about the types of classes observed is in Table 1. The and lecturing while the class listened. After 2 min, the video
seven UBC STEM faculty member volunteers who used the was paused, and the group discussed which codes they se-
final protocol had not previously used the protocol and were lected. Because faculty at other institutions may have diffi-
not involved in the development process. Thus, the IRR of culty capturing videos for training, we have included web
the protocol has been tested with a sample of observers with URLs to various video resources that can be used for training
a wide range of backgrounds and perspectives. As discussed (Table 2).
in Validity and Reliability, the IRR was high. The observers were then asked to form pairs and code 8
min of a video from a large-enrollment, lecture-style science
class at UMaine that primarily shows an instructor lectur-
TRAINING ing and students listening, with a few questions asked by
both the instructor and students. To keep the observers syn-
A critical design feature of the COPUS is that college and chronized and ensure they were filling out a new row in the
university faculty who have little or no observation protocol observation protocol at identical 2-min intervals, they used
experience and minimal time for training can use it reliably. either cell phones set to count time up or a sand timer. At

Figure 3. An excerpt of the COPUS coding form. Observers place a single checkmark in the box if a behavior occurs during a 2-min segment.
Multiple codes can be marked in the same 2-min block.

622 CBE—Life Sciences Education

Downloaded from http://www.lifescied.org/ by guest on March 13, 2015


Observation Protocol for STEM Classes

Table 2. Video resources that may be helpful for COPUS training

Description of video URL

Demonstration, clicker questions, and lecture http://harvardmagazine.com/2012/02/interactive-teaching


Group activities and lecture http://podcasting.gcsu.edu/4DCGI/Podcasting/UGA/Episodes/12746/614158822.mov
Clicker questions and lecture http://podcasting.gcsu.edu/4DCGI/Podcasting/UGA/Episodes/22253/27757327.mov
Clicker, real-time writing, and lecture http://ocw.mit.edu/courses/chemistry/5-111-principles-of-chemical-science-fall-2008/
video-lectures/lecture-19
Real-time writing, asking/answering http://ocw.mit.edu/courses/biology/7-012-introduction-to-biology-fall-2004/
questions, and lecture video-lectures/lecture-6-genetics-1

the end of 8 min, the observers compared their codes with VALIDITY AND RELIABILITY
their partners. Next, as a large group, observers took turns COPUS is intended to describe the instructor and student ac-
stating what they coded for the students and the instructor tions in the classroom, but it is not intended to be linked to
every 2 min for the 8-min video clip. At this point, the ob- any external criteria. Hence, the primary criterion for validity
servers talked about the relationship between a subset of the is that experts and observers with the intended background
student and instructor codes. For example, if the observers (STEM faculty and teachers) see it as describing the full range
check the student code “CG: Discuss clicker question,” they of normal classroom activities of students and instructors.
will also likely check the instructor code “CQ: Asking a clicker That validity was established during the development pro-
question.” cess by the feedback from the SESs, the K–12 teachers, and
To provide the observers with practice coding a segment those authors (M.S., F.J., C.W.) who have extensive experience
that has more complicated student and instructor codes, they with STEM instruction and classroom observations.
next coded a different classroom video segment from the A major concern has been to ensure that there is a high
same large-enrollment, lecture-style science class at UMaine, level of IRR when COPUS is used after the brief period of
but this time the camera was focused on the students. This training described above. To assess the IRR, we examined the
video segment included students asking the instructor ques- agreement between pairs of observers as they used the final
tions, students answering questions from the instructor, and version of COPUS in STEM classes at both UBC and UMaine.
clicker questions with both individual thought and peer dis- The two observers sat next to each other in the classroom,
cussion. The observers coded 2 min and then paused to dis- so they could keep identical 2-min time increments, but the
cuss the codes. Then observers in pairs coded for an addi- observers were instructed not to compare codes with each
tional 6 min, again taking care to use synchronized 2-min other.
increments. The observer pairs first compared their codes To summarize how similarly observer pairs used each code
with their partners, and then the whole group discussed the on the final version of the COPUS, we calculated Jaccard sim-
student and instructor codes for each of the 2-min segments ilarity scores (Jaccard, 1901) for each code and then averaged
of the 6-min clip. At this point, the training was complete. the scores for both the UBC and UMaine observers (Table 3).

Table 3. Average Jaccard similarity scores for COPUS codes across all pairs observing in all courses for both UBC faculty observers and
Maine K–12 teacher observers; numbers closer to 1 indicate the greatest similarity between two observers

Student code UBC UMaine Instructor code UBC UMaine

L: Listening 0.95 0.96 Lec: Lecturing 0.91 0.92


Ind: Individual thinking/problem solving 0.97 0.91 RtW: Real-time writing 0.93 0.93
CG: Discuss clicker question 0.98 0.97 FUp: Follow-up on clicker questions or activity 0.92 0.85
WG: Working in groups on worksheet activity 0.98 0.99 PQ: Posing nonclicker questions 0.86 0.80
OG: Other group activity Not used 0.97 CQ: Asking a clicker question 0.93 0.97
AnQ: Students answer question posed by instructor 0.91 0.84 AnQ: Answering student questions 0.94 0.89
SQ: Student asks question 0.96 0.93 MG: Moving through the class 0.96 0.97
WC: Engaged in whole-class discussion 0.96 0.98 1o1: One-on-one discussions with students 0.94 0.96
Prd: Making a prediction about the outcome of demo Not used 1.00 D/V: Conducting a demo, experiment, etc. 0.97 0.98
or experiment
SP: Presentation by studentsa Not used Not used Adm: Administration 0.94 0.97
TQ: Test or quiza Not used Not used W: Waiting 0.95 0.98
W: Waiting 0.99 0.98 O: Other 0.97 1.00
O: Other 0.94 0.99
a “SP: Presentation by students” and “TQ: Test/quiz” were not selected in any of the observations at UBC or UMaine. This result likely occurred
because when we asked UBC and UMaine faculty members if we could observe their classes, we also asked them if there was anything unusual
going on in their classes that day. We avoided classes with student presentations and tests/quizzes, because these situations would limit the
diversity of codes that could be selected by the observers.

Vol. 12, Winter 2013 623

Downloaded from http://www.lifescied.org/ by guest on March 13, 2015


M. K. Smith et al.

For single codes, we calculated Jaccard similarity scores in- The average kappa scores ranged from 0.79 to 0.87
stead of IRR Cohen’s kappa values, because observer pairs (Table 4). These are considered to be very high values for
occasionally marked the same code for every 2-min incre- kappa and thus indicate good IRR (Landis and Koch, 1977).
ment throughout the duration of the class. For example, in Notably, the kappa values, as well as the Jaccard similar-
a class that is lecture-based, observers would likely mark ity scores, are comparably high for both UBC faculty and
the student code “L: Listening” for the entire time. In a UMaine K–12 teacher observers, indicating that COPUS is re-
case such as this, the observer opinion is defined as a con- liable when used by observers with a range of backgrounds
stant rather than a variable, which interferes with the IRR and 2 hours or fewer of training.
calculation.
The equation for the Jaccard coefficient is T = nc /(na + nb
− nc ), where nc = the number of 2-min increments that are ANALYZING COPUS DATA
marked the same (either checked or not checked) for both ob-
servers, na = the number of 2-min increments that are marked To determine the prevalence of different codes in various
the same for both observers plus 2-min increments observer 1 classrooms, we added up how often each code was marked
marked that observer 2 did not, nb = number of 2-min incre- by both observers and then divided by the total number of
ments that are marked the same for both observers plus 2-min codes shared by both observers. For example, if both ob-
increments observer 2 marked that observer 1 did not. For ex- servers marked “Instructor: Lecture” at the same 13 time
ample, for the data in Figure 2B, the class period is 42 min in intervals in a 50-min class period and agreed on marking
length, so there are 21 possible 2-min segments. The student 25 instructor codes total for the duration of the class, then
code “Ind: Individual thinking” was marked 12 times by ob- 13/25, or 52% of the time, the lecture code occurred for the
servers 1 and 2, not marked eight times by both observers, instructor.
and marked by observer 2 one time when observer 1 did not. We visualized the prevalence of the student and instructor
Therefore, the calculation is: 20/(20 + 21 − 20) = 0.95. Num- codes using pie charts. Figure 4 shows observation results
bers closer to 1 indicate greater consistency between how the from two illustrative classes: one that is primarily lecture-
two observers coded the class. based and one in which a combination of active-learning
Eighty-nine percent of the similarity scores are greater than strategies are used. The latter class is clearly differentiated
0.90, and the lowest is 0.80. These values indicate strong sim- from the lecture-based class. This example illustrates how,
ilarity between how two observers use each code. The lowest at a glance, this visual representation of the COPUS results
score for both the UBC and UMaine observers was for the in- provides a highly informative characterization of the student
structor code “PQ: Posing nonclicker questions.” Comments and instructor activities in a class.
from observers suggest that, when instructors were following At a department- or institution-wide level, there are sev-
up/giving feedback on clicker questions or activities, they of- eral ways to categorize the range of instructional styles. One
ten posed questions to the students. Observers checked the of the simplest is to look at the prevalence of the student
instructor code “FUp: Follow-up” to describe this behavior code “L: Listening to instructor/taking notes, etc.” across all
but stated they occasionally forgot to also select the instructor courses observed, because this student code is the most in-
code “PQ.” dicative of student passive behavior in response to faculty
To compare observer reliability across all 25 codes in the lecturing (“Lec”) with or without real-time writing (“RtW”).
COPUS protocol, we calculated Cohen’s kappa IRR scores Figure 5 shows that at both institutions the “L” code was
using SPSS (IBM, Armonk, NY). To compute the kappa values marked 26–75% of the time. However, at UMaine, some of the
for each observer pair, we added up the total number of times: classes have greater than 76% of the student codes devoted
1) both observers put a check in the same box, 2) neither to listening. Faculty who teach these classes may benefit from
observer put a check in the same box, 3) observer 1 put a professional development activities about how to design an
check in a box when observer 2 did not, and 4) observer 2 effective active-learning classroom.
put a check in a box when observer 1 did not. For example, at In addition, the data can be analyzed for a subset of fac-
UBC, when looking at all 25 codes in the COPUS, one observer ulty members who are using active-learning strategies, such
pair had the following results: 1) both observers put a check as asking clicker questions. Thirty-eight percent of UBC and
in 83 of the same boxes, 2) neither observer put a check in 524 43% of the UMaine classes that were observed used clickers.
of the boxes, 3) observer 1 marked six boxes when observer 2 However, student code prevalence in these classes show that
did not, and 4) observer 2 marked 12 boxes that observer 1 did not all faculty members used clicker questions accompanied
not. Using data such as these, we computed the kappa score by recommended strategies, such as peer discussion (Mazur,
for each of the eight UBC and 23 UMaine pairs and report the 1997; Smith et al., 2009, 2011; Figure 6). Faculty members who
average scores in Table 4. We also repeated this calculation are not allowing time for peer discussion may benefit from
using either the subset of 13 student or 12 instructor codes professional development on how to integrate peer discus-
(Table 4). sion into clicker questions.

Table 4. Average IRR kappa scores from the observations at UBC and UMaine

Observers All codes (± SE) Student codes (± SE) Instructor codes (± SE)

Faculty observing UBC courses 0.83 (0.03) 0.87 (0.04) 0.79 (0.04)
Teachers observing UMaine courses 0.84 (0.03) 0.87 (0.04) 0.82 (0.04)

624 CBE—Life Sciences Education

Downloaded from http://www.lifescied.org/ by guest on March 13, 2015


Observation Protocol for STEM Classes

Figure 4. A comparison of COPUS results from two courses that have different instructional approaches.

Figure 5. Prevalence of the student code “L: Listening” across several UBC and UMaine classes.

Figure 6. Prevalence of student codes in four ex-


ample courses that use clickers. In courses that use
clickers with no or minimal peer discussion, the stu-
dents are passively listening the majority of the time.

Vol. 12, Winter 2013 625

Downloaded from http://www.lifescied.org/ by guest on March 13, 2015


M. K. Smith et al.

DISCUSSION AND SUMMARY and Joanna Meyer for helping to run the UMaine training session;
Jeremy Smith for developing scripts to parse and analyze the data;
COPUS was developed because university observation pro- the Maine K–12 teachers and UBC faculty volunteers who served as
observers; and the faculty at UBC and UMaine who allowed their
grams needed a protocol to: 1) characterize the general state courses to be observed.
of teaching, 2) provide feedback to instructors who desired in- Approval to observe classrooms and instruction at UBC and pub-
formation about how they and their students were spending lish results of that work is provided to the Carl Wieman Science
class time, and 3) identify faculty professional development Education Initiative by the University of British Columbia under
needs. COPUS meets all of these goals by allowing observers the policy on institutional research. Approval to evaluate teacher
with little observation protocol training and experience to re- observations of classrooms (exempt status, protocol no. 2013-02-06)
was granted by the Institutional Review Board at the University of
liably characterize what both faculty and students are doing Maine.
in a classroom.
There are several uses for COPUS data. On an individual
level, faculty members can receive pie charts with their code
prevalence results (examples in Figure 4). These results pro- REFERENCES
vide a nonthreatening way to help faculty members evaluate American Association for the Advancement of Science (2010). Vision
how they are spending their time. We discovered that fac- and Change: A Call to Action, Washington, DC.
ulty members often did not have a good sense of how much Association of American Universities (2011). Five-Year Initiative
time they spent on different activities during class, and found for Improving Undergraduate STEM Education, AAU, Washington,
COPUS data helpful. DC. www.aau.edu/WorkArea/DownloadAsset.aspx?id=14357 (ac-
In addition, faculty members can use COPUS data in their cessed 7 August 2013).
tenure and promotion documents to supplement their normal Blanchard MR, Southerland SA, Osborne JW, Sampson VD, Annetta
documentation, which typically includes student evaluation LA, Granger EM (2010). Is inquiry possible in light of accountability?
information and a written description of classroom practices. A quantitative comparison of the relative effectiveness of guided
Having observation data gives faculty members substantially inquiry and verification laboratory instruction. Sci Educ 94, 577–616.
more information to report about their use of active-learning Bloom B, Engelhart MD, Furst EJ, Hill WH, Krathwohl DR (1956).
strategies than is usually the case. Taxonomy of Educational Objectives: The Classification of Educa-
COPUS data can also be used to develop targeted pro- tional Goals, Handbook I: Cognitive Domain, New York: David
McKay.
fessional development. For example, anonymized, aggregate
COPUS data across all departments have been shared with Classroom Observation Project (2011). Classroom Observation
the UMaine Center for Excellence in Teaching and Assess- Project: Understanding and Improving Our Teaching Using
the Reformed Teaching Observation Protocol (RTOP). http://
ment, so workshops and extended mentoring opportunities serc.carleton.edu/NAGTWorkshops/certop/about.html (accessed 7
can better target the needs of the faculty. One area in particular August 2013).
that will be addressed in an upcoming professional develop-
CWSEI Teaching Practices Survey (2013). http://www.cwsei.ubc.ca/
ment workshop is using clickers in a way that promotes peer resources/TeachPracticeSurvey.htm (accessed 29 October 2013).
discussion. The idea for this workshop came about as a result
of the COPUS evidence showing the prevalence of UMaine Ebert-May D, Derting TL, Hodder J, Momsen JL, Long TM, Jardeleza
SE (2011). What we say is not what we do: effective evaluation of
STEM classes that were using clickers but allowing no or faculty professional development programs. BioSci 61, 550–558.
minimal time for recommended student peer discussions
(Figure 6). Henderson C, Beach A, Finkelstein N (2011). Facilitating change in
undergraduate STEM instructional practices: an analytic review of
Other planned uses for COPUS include carrying out sys- the literature. J Res Sci Teach 48, 952–984.
tematic observations of all instructors in a department at UBC
Hora MT, Oleson A, Ferrare JJ (2013). Teaching Dimensions
in order to characterize teaching practices. The information
Observation Protocol (TDOP) User’s Manual, Madison: Wiscon-
will be used with other measures to characterize current us- sin Center for Education Research, University of Wisconsin–
age of research-based instructional practices across the de- Madison. http://tdop.wceruw.org/Document/TDOP-Users-Guide
partment’s courses and curriculum. .pdf (accessed 7 August 2013).
In the end, the choice of observation protocol and strategy Jaccard P (1901). Étude comparative de la distribution florale dans
will depend on the needs of each unique situation. COPUS is une portion des Alpes et des Jura. Bull de la Société Vaudoise des Sci
easy to learn, characterizes nonjudgmentally what instructors Nat 37, 547–579.
and students are doing during a class, and provides data Knight JK, Wood WB (2005). Teaching more by lecturing less. Cell
that can be useful for a wide range of applications, from Biol Educ 4, 298–310.
improving an individual’s teaching or a course to comparing Landis JR, Koch GG (1977). The measurement of observer agreement
practices longitudinally or across courses, departments, and for categorical data. Biometrics 33, 159–174.
institutions.
Lemons PP, Lemons JD (2012). Questions for assessing higher-order
cognitive skills: it’s not just Bloom’s. CBE Life Sci Educ 12, 47–58.

ACKNOWLEDGMENTS Mazur E (1997). Peer Instruction, Upper Saddle River, NJ: Prentice
Hall.
This work was supported at UBC through the Carl Wieman Science Michael J (2006). Where’s the evidence that active learning works?
Education Initiative and by the National Science Foundation under Adv Physiol Educ 30, 159–167.
grant #0962805. We are grateful for the assistance of all of the UBC
SESs who contributed to the development of the survey; Lisa McDon- Millis B (1992). Conducting effective peer classroom observations.
nell and Bridgette Clarkston for running the UBC training session; http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1249&
MacKenzie Stetzer, Susan McKay, Erika Allison, Medea Steinman, context=podimproveacad (accessed 28 October 2013).

626 CBE—Life Sciences Education

Downloaded from http://www.lifescied.org/ by guest on March 13, 2015


Observation Protocol for STEM Classes

President’s Council of Advisors on Science and Technology (2012). Smith MK, Wood WB, Adams WK, Wieman C, Knight JK, Guild N,
Report to the President: Engage to Excel: Producing One Mil- Su TT (2009). Why peer discussion improves student performance
lion Additional College Graduates with Degrees in Science, Tech- on in-class concept questions. Science 323, 122–124.
nology, Engineering, and Mathematics, Washington, DC: Execu-
Smith MK, Wood WB, Krauter K, Knight JK (2011). Combining peer
tive Office of the President. www.whitehouse.gov/sites/default/
discussion with instructor explanation increases student learning
files/microsites/ostp/pcast-engage-to-excel-v11.pdf (accessed 7
from in-class concept questions. CBE Life Sci Educ 10, 55–63.
August 2013).
Prince M (2004). Does active learning work? A review of the research. Weiss IR, Pasley JD, Smith PS, Banilower ER, Heck DJ (2003).
J Eng Educ 93, 223–231. Looking Inside the Classroom: A Study of K–12 Mathematics and
Science Education in the United States, Chapel Hill, NC: Horizon
Sawada D, Piburn MD, Judson E, Turley J, Falconer K, Benford R, Research.
Bloom I (2002). Measuring reform practices in science and math-
ematics classrooms: the Reformed Teaching Observation Protocol. West EA, Paul CA, Webb D, Potter WH (2013). Variation of instructor-
Sch Sci Math 102, 245–253. student interactions in an introductory interactive physics course.
Phys Rev ST Phys Educ Res 9, 010109.
Singer SR, Nielsen NR, Schweingruber HA (2012). Discipline-Based
Education Research: Understanding and Improving Learning in Un- Wieman C, Perkins K, Gilbert S (2010). Transforming science educa-
dergraduate Science and Engineering, Washington, DC: National tion at large research universities: a case study in progress. Change
Academies Press. 42, 7–14.

Vol. 12, Winter 2013 627

Downloaded from http://www.lifescied.org/ by guest on March 13, 2015

You might also like