Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Cohen 1981

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Review of Educational Research

Fall, 1981, Vol. 51, No. 3, Pp. 281-309

Student Ratings of Instruction and Student


Achievement: A Meta-analysis of Multisection
Validity Studies
Peter A. Cohen
Dartmouth College

The present study used met a-analytic methodology to synthesize research on


the relationship between student ratings of instruction and student achievement.
The data for the meta-analysis came from 41 independent validity studies
reporting on 68 separate multisection courses relating student ratings to student
achievement. The average correlation between an overall instructor rating and
student achievement was .43; the average correlation between an overall course
rating and student achievement was .47. While large effect sizes were also
found for more specific rating dimensions such as Skill and Structure, other
dimensions showed more modest relationships with student achievement. A
hierarchical multiple regression analysis showed that rating/achievement cor-
relations were largerfor full-time faculty when students knew theirfinal grades
before rating instructors and when an external evaluator graded students'
achievement tests. The results of the meta-analysis provide strong support for
the validity of student ratings as measures of teaching effectiveness.

The literature dealing with student ratings of college instruction is voluminous.


Hundreds of articles have been published concerning issues such as the use of ratings,
their reliability and validity, and potential biasing factors. Because of conflicting
findings in this literature, however, it is difficult for reviewers to observe general
trends.
Perhaps the most critical question about student ratings of instruction is whether
they are valid: whether they actually measure teaching effectiveness. Although
teaching effectiveness is difficult to defme, it is generally thought of as the degree to
which an instructor facilitates student achievement (McKeachie, 1979). It can be
further operatiónalized as the amount students learn in a particular course. This is at
best a crude index of teaching effectiveness because a number of factors outside the
teacher's control—student ability and motivation, for example—affect the amount
students learn. Nonetheless, if student ratings are to have any utility in evaluating
teaching, they must show at least a moderately strong relationship to this index.
Some researchers question the validity of student ratings. Chandler (1978) and
Sheehan (1975), for example, oppose the use of student ratings for administrative
decisions because of "biasing factors" inherent in the ratings. From their perspective,

This paper is based on the author's doctoral dissertation conducted at The University of
Michigan.
I would like to thank James Kulik, Wilbert McKeachie, Robert Blackburn, and David StaΓks
for their valuable comments and suggestions.

281

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


PETER A. COHEN

ratings may relate more to the characteristics of the rater than to the amount learned.
The weight of the evidence, however, suggests that student ratings are not influenced
to an undue extent by external factors such as student characteristics, course
characteristics, or teacher characteristics (cf McKeachie, 1979).
Many investigators in this area have studied the relationship between ratings and
student learning. There is by no means, though, total agreement on the extent of this
relationship. In fact, Kulik and McKeachie (1975) state that "the most impressive
thing about studies relating class achievement to class ratings of instructors is the
inconsistency of the results" (p. 235). While some investigators have found strong
positive correlations between ratings and student learning (e.g., Centra, 1977; Costin,
1978; Frey, 1973), others have found equally strong negative correlations (e.g.,
Bendig, 1953b; Rodin & Rodin, 1972). Reviewers acknowledge that in general there
seem to be small to moderate correlations between ratings and learning (Kulik &
Kulik, 1974; Kulik & McKeachie, 1975; McKeachie, 1979; Seibert, 1979).
There are numerous unanswered questions concerning this body of literature. Can
overall conclusions be drawn concerning the validity of student ratings as they relate
to student learning? Or does the relationship vary depending on different circum-
stances? Does the relationship between ratings and student learning depend on the
type of rating dimension, instructor experience, subject matter, or the type of
correlation coefficient computed? At this point we do not know what factors
contribute to the diversity of findings in this area. Our ignorance is not due to the
lack of investigations conducted in this area. The relationships between ratings and
achievement have been investigated many times by many investigators. Rather, the
problem is that traditional research reviews have failed to collect studies systemati-
cally and to synthesize their results effectively.
In his presidential address to the American Educational Research Association,
Glass (1976) described an alternative to the conventional review. He referred to his
method as meta-analysis, or the analysis of analyses. He defined this method formally
as the statistical analysis of a large collection of results from individual studies for
the purpose of integrating findings. Reviewers who carry out meta-analyses fìrst
locate studies of an issue by clearly specified procedures. They then characterize the
outcomes and features of these studies in quantitative or quasi-quantitative terms.
Finally, meta-analysts use multivariate techniques to describe findings and relate
characteristics of the studies to outcomes.
In the years since Glass's address, a number of researchers have used this method
to synthesize results of psychological and educational research. For example, recent
reports of the use of meta-analysis examined effects in the following areas: mid-term
student-rating feedback (Cohen, 1980a); class size and achievement (Glass & Smith,
1979); gender differences in nonverbal communication (Hall, 1978); individualized
instruction at the college level (Cohen, Ebeling, & Kulik, 1981; Kulik, Cohen, &
Ebeling, 1980; C-L. Kulik, Kulik, & Cohen, 1980; J. A. Kulik, Kulik, & Cohen,
1979a, 1979b, 1980); open versus traditional education (Peterson, 1979); psychother-
apy and counseling (Smith & Glass, 1977); and experimenter effects (Rosenthal,
1976). A more detailed review of meta-analytic methodology and its application is
presented in Cohen (1980b).
The present study used Glass's method to synthesize research on the relationship
between student ratings of instruction and student achievement. The research in-
cluded in this synthesis came from "field" studies of actual college classes. Although

282

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


STUDENT RATINGS

most studies were carried out in lower-division courses, a wide variety of subject
matter areas was sampled.
This meta-analytic research will enhance our understanding of the student-rating
literature in three ways: First, the synthesis will lead to general conclusions on the
overall relationship between ratings and achievement. Second, the research will show
the conditions under which the relationship is positive or negative, weak or strong.
Finally, the meta-analysis will provide some idea about the representativeness of the
literature in this area: about areas that have been studied thoroughly and areas that
have been studied too little. The results of the synthesis will be of use to administrators
and faculty members who use ratings to improve teaching. They will also be of use
to educational researchers who need a better picture of the state-of-the-art in research
on the evaluation of teaching.

Validity of Student Ratings


The increase in popularity of student ratings as measures of instructional effec-
tiveness consequently has focused a great deal of attention on their validity. In
general, researchers have used a criterion-related approach to establish validity,
demonstrating a relationship between student ratings and other measures of teaching
effectiveness. Some of the criteria against which student ratings have been evaluated
are: (1) ratings made by faculty colleagues, (2) ratings made by administrators, (3)
faculty self-ratings, (4) ratings made by alumni, and (5) student achievement. Because
we lack a universal definition of good teaching, however, the criterion-related
approach is at best limited. More appropriate to student ratings is an approach based
on construct validation (Campbell, 1960). Here it is important, as Marsh and Overall
(1980) suggest, that ratings be correlated with numerous teaching effectiveness criteria
and uncorrelated with factors assumed to be irrelevant to quality teaching (i.e.,
student, course, and instructor characteristics). This literature is reviewed more
thoroughly elsewhere (Centra, 1979; Costin, Greenough, & Menges, 1971; Doyle,
1975; Feldman, 1976; Kulik & Kulik, 1974, Marsh, 1980; McKeachie, 1979; Seldin,
1980).
Even though there is a lack of unanimity on a definition of good teaching, most
researchers in this area agree that student learning is the most important criterion of
teaching effectiveness. As McKeachie (1979) put it, "we take teaching effectiveness
to be the degree to which one has facilitated student achievement of educational
goals" (p. 385). Indeed, most research on the validity of student ratings has focused
on the relationship between these ratings and student achievement. Because of
conflicting findings, however, it is difficult to draw firm conclusions. Furthermore,
several complexities in this research literature add to the difficulty of drawing overall
generalizations concerning the validity of student ratings.
One complexity is the variety of ways of assessing student achievement. The most
appropriate way of assessing student learning in a validity study is by using some
sort of achievement measure, typically a course final examination. In these cases, the
researcher should control for initial student ability. Some investigators prefer other
criteria for evaluating student achievement, such as the final grade received in the
course. In some cases, expected grades are even used as the criterion of learning.
Unless grades are based directly on objective achievement criteria, however, theý
should not be used as measures of student achievement.

283

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


PETER A. COHEN

A second complexity in the rating/achievement validity literature is the frequent


confusion concerning the appropriate unit of analysis on which to base generaliza-
tions. Some investigators use the student as the unit of analysis, correlating an
individual student's achievement with his or her rating of the course or instructor.
Other investigators use the class as the unit of analysis, relating mean class achieve-
ment with the mean class instructor rating. Research reviews often lump both kinds
of studies together. These two types of studies, however, address distinctly different
questions. Researchers using the individual student as the level of analysis are not
asking (although they may intend to) whether the teachers who receive high student
ratings are also the ones who contribute to student learning. Rather, their research
design enables them to determine whether students who learn more than other
students, regardless of the class they are in, give higher ratings to instructors and
courses. Unfortunately, this type of analysis does not answer our validity concerns.
To do this we need to know the relationship between ratings and student achievement
for individual teachers. That is, do student ratings differentiate to any degree among
teachers in terms of their contribution to student learning? This question can only be
answered by using the class (or instructor) as the unit of analysis in a validity design.
A third complexity comes from the multidimensional structure of most rating
instruments. Not all of the dimensions should relate to student achievement to the
same extent. For example, we would anticipate that the "Skill" dimension, which
taps instructor competence, is related to student achievement. On the other hand,
there is no reason to expect that "Course Difficulty" should be related to achievement.
Yet, some reviewers draw conclusions by "summing over" the validity coefficients of
different teaching dimensions. Determining the extent to which various rating
dimensions relate to student achievement has not been accomplished systematically
up to this point.
A final source of complexity in rating/achievement studies is the variety of settings
in which the studies are conducted. For instance, the instructors in some studies are
full-time faculty members; the instructors in other studies are teaching assistants.
Ratings are made before students know their final grades in some studies; ratings in
other studies are made after students know their grades. Some studies use teacher-
made rating items; others use standardized rating forms. Even when a reviewer
attempts to account for one or two specific study features, the conclusions are rarely
based on more than a handful of studies.
Table I lists the major reviews on the student rating/achievement relationship. For
each review, the table shows the number of multisection validity design studies
included in the review, the conclusions of the review, and the limitations of the
review. Although the reviewers use somewhat different sets of studies, their conclu-
sions are very similar: that there is a low to moderate correlation between student
ratings of instruction and student achievement.
A main purpose of the present study is not only to summarize statistically the
validity coefficients for different rating dimensions, but also to assess the effect of
particular study characteristics that may influence the magnitude of the relationship
between ratings and achievement. The meta-analytic methodology employed in this
study will help determine the impact of study features that other investigators have
suggested may affect the relationship. Specific characteristics of the validity studies
which have been addressed by researchers and merit further discussion are: assign-
ment of students to teachers, presence of a control for the students' initial ability,

284

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


STUDENT RATINGS

TABLE I
Major Reviews on the Student Rating/Achievement Relationship
Multi-
section
Review Conclusions Limitations
Validity
Studies
Centra (1979) Relationship between ratings and achieve-
ment significant, but limited range of both
variables may suppress correlations

Costin, Greenough, No comment on the rating/achievement re- Some studies use grades as
&Menges(1971) lationship achievement criterion

Not all studies use class as


unit of analysis

No distinction between rat-


ing dimensions

No account for study fea-


ture effects

Doyle (1975) Fairly consistent low-to-moderate positive No account for study fea-
correlation between general ratings and ture effects
student learning

Skill dimension related to learning

Rapport dimension not related to learning

Follman (1974) Relationship between ratings and achieve- Not all studies use class as
ment about 0.40 across all school levels— unit of analysis
a "low" relationship
No distinction between rat-
ing dimensions

No account for study fea-


ture effects

Gage (1974) Correlations between ratings and achieve- Not all studies uses class as
ment are positive and low to medium in unit of analysis
magnitude

Ratings are valid as indicators of student


learning

Kulik & Kulik Inconsistency of results


(1974)
Median correlation 0.27 (adjusted), 0.23
(unadjusted) for overall rating

Tendency for students of highly rated teach-


ers to outscore students of low-rated
teachers on final exam

Kulik & Mc- Inconsistency of results No account for study fea-


Keachie (1975) ture effects

Marsh (1980) Overall ratings show low to moderate cor-


relations with achievement

Lack of consistency of which evaluation


factors most highly related to learning

285

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


PETER A. COHEN

TABLE I—Continued
Multi-
section
Review Conclusions Limitations
Validity
Studies
McKeachie(l979) 7 Validity of ratings reasonably encouraging
with respect to achievement on course
examinations

Mintzes(l977) 7 Weak positive correlation coefficients, av- Not all studies use class as
eraging 0.20 to 0.30 unit of analysis

No distinction between rat-


ing dimensions

No account for study fea-


ture effects

Seibert(l979) 5 Students rate most highly instructors from Not all studies use class as
whom they learn most unit of analysis

instructor experience, type of rating items or instrument used, and the number of
sections used.

Assignment of Students to Teachers


Leventhal (1975) maintained that the strongest student rating validation design
involves random assignment of students to different sections of a multisection course.
When random assignment is achieved, between-section differences in student
achievement can be attributed to differences in teachers. However, in most situations
students are not randomly assigned to classes; rather, they self-select into sections.
When this is the case in a multisection validity design study, it can be difficult to
draw inferences concerning the rating/achievement relationship. Marsh and Overall
(1980) claimed that most studies of this sort provide inadequate controls for possible
differences in students' initial ability and motivation. In other words, neither equiv-
alence of sections on some measure of initial ability nor statistical control for initial
ability on the achievement criterion are demonstrated. Leventhal and his colleagues
(Leventhal, 1975; Leventhal, Abrami, & Perry, 1977; Leventhal, Abrami, Perry, &
Breen, 1975) provided evidence that students can significantly differ across sections
on both biographical variables and section selection reasons. These researchers
suggested that section selection based on factors such as teacher ability and teacher
reputation confounds the relationship between ratings and achievement.

Control for Initial Student Ability


With one exception, reviewers have not systematically accounted for whether or
not the achievement measure was adjusted for initial student ability in the multisec-
tion validity literature. For the studies they reviewed, Kulik and Kulik (1974) showed
that this type of control made little difference in summarizing the overall rating/
achievement relationship. Most validity studies that account for students' initial
ability or aptitude make use of a part correlation, where ratings are correlated with
a residualized measure of achievement. Rarely do studies use a partial correlation

286

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


STUDENT RATINGS

where both ratings and achievement are adjusted for initial student ability, although
this was the case in the Rodin and Rodin (1972) study.

Instructor Experience
Sullivan and Skanes (1974) reported different validity coefficients for experienced
instructors (those who had taught more than 1 year) and inexperienced instructors
(those who had taught less than 1 year). For experienced full-time instructors, the
correlation between ratings and student achievement was .68; for inexperienced
teachers the correlation was only .13. For a psychology course, Sullivan and Skanes
were able to compute separate coefficients for full-time psychology faculty and
graduate student (part-time) instructors. Again, they found that for the full-time
faculty, the validity coefficient was quite high (.53); for the graduate student instruc-
tors, the validity coefficient was trivial (r = .01). A number of reviewers (e.g., Kulik
& Kulik, 1974; Seibert, 1979) also suggested that the different degrees of instructor
experience found in different validity studies contributed to the diversity of their
results.
Rating Instrument Bias
Although some multisection validity studies made use of standardized rating
instruments and scales, others used teacher-constructed scales or even single-item
ratings. Marsh and Overall (1980) maintained that the lack of consistent results for
this body of studies may be due to a lack of well-defined factor structures in the
instruments used in many studies.

Number of Sections
A number of reviewers have commented on the small number of sections on which
multisection validity studies are typically based. For instance, Vecchio (1980) said
that he places "little confidence" in the magnitude of the obtained relationships
because of the instability of correlations derived from small sample sizes. Similarly,
Marsh and Overall (1980) concluded that the small number of sections in most
validity studies is not adequate and contributes to the variability in findings. Kulik
and McKeachie (1975) further pointed out that large correlations (positive or
negative) tend to occur when sample sizes are small; more modest correlations appear
when adequate sample sizes are used. Finally, Doyle (1975) suggested that in order
to derive a stable validity coefficient, at least 30 sections should be used in a
multisection study.

Other Study Characteristics


A variety of other study features may affect the magnitude of the rating/achieve-
ment relationship. The time at which the ratings are administered to students could
influence their ratings of instruction. Another variable that should be accounted for
in the multisection validity studies is teacher autonomy. That is, to what extent do
the teachers who are being evaluated have control over the learning environment? In
addition, characteristics of the achievement criterion (e.g., departmental versus
standardized exam; essay versus objective exam; who evaluates the exam) and the
course (e.g., introductory versus advanced; subject matter) need to be considered.
Finally, study characteristics such as study quality, type of institution, and publication

287

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


PETER A. COHEN

features (e.g., source of publication; study year) that have correlated with outcomes
in other meta-analyses will be explored in the present study.

Methods
This section describes the procedures used to locate studies, to determine which
studies would be included in the analyses, to describe study characteristics, to
quantify outcomes of these studies, and to analyze the data.

Locating Studies
The first step in the meta-analysis was to locate as many studies as possible that
dealt with the relationship between student ratings of instruction and student
achievement. The primary sources for these studies were the major reviews listed in
Table I and three library data bases computer-searched through Lockheed's
DIALOG Online Information Service. The data bases included: (a) Comprehensive
Dissertation Abstracts; (b) ERIC, a data base on educational materials from the
Educational Resources Information Center, consisting of the two files Research in
Education and Current Index to Journals in Education; and (c) Psychological Abstracts.
The investigator developed a special set of key words for each computer search in
order to take into account the distinctive features of the different data bases. For
example, in the ERIC data base the key words included: "academic achievement" or
"grades," "higher education," and "course evaluation" or "student evaluation of
teacher performance." Branching from the bibliographies in articles located through
the original searches provided a third source of studies for the meta-analysis. In
addition, the investigator monitored recent issues of relevant educational and psy-
chological journals.
In all, the bibliographic searches yielded a total of approximately 450 titles. Most
of the articles, however, failed in one way or another to meet the criteria established
for the analysis. On the basis of information about the articles contained in titles or
abstracts, the initial pool of 450 titles was reduced to 105 potentially useful documents.
The investigator obtained copies of these 105 documents and read them in full. Of
the 105 reports, 41 contained data that could be used in the meta-analysis. These 41
documents reported on 68 separate mulîisection courses relating student ratings of
instruction to student achievement. The 41 studies are listed in Table II.

Criteria for Including Studies


To be included in the final sample, a study had to meet three basic criteria: First,
the study had to provide data from actual college classes. That is, data had to come
from "field studies" rather than experimental analogues of teaching such as the "Dr.
Fox studies" (cf. Ware & Williams, 1975). Second, the unit of analysis in the study
had to be the class or instructor rather than the individual student. In other words,
the study had to provide data from which correlations between mean class (instructor)
ratings and mean class achievement could be derived. Third, data had to be based on
a multisection course with a common achievement measure used for all sections. The
analysis did not include data from university-wide course samples or data from
classes using different criteria for measuring achievement.
A methodological problem that faces the meta-analyst is how to determine the
number of effect sizes or "findings" from the pool of studies. In the present meta-

288

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


STUDENT RATINGS

TABLE II
Studies Used in the Meta-analysis
Overall Ratings Specific Ratings
Study Correlated with Correlated with
Achievement Achievement
Bendig (1953a) OC, OI
Bendig(l953b) OI
Benton<fe Scott (1976) OC, OI SK, R, ST, D, I, E, SP
Bolton, Bonge, & Man (1979) OC, OI SK, R, ST, E
Borg& Hamilton (1956) OI
Brasķamp, Caulley, & Costin (1979) OC, OI SK, R, ST, I
Bryson(1974) OI SK, R, ST, I, F, E
Centra (1977) OC, OI SK, R, ST, D, E
Chase & Keene (1979) OI SK, ST, D, E, SP
Cohen &Berger (1970) OC, OI SK, ST, D, I
Costin (1978) OI SK
Crooks & Smock (1974) OI
Doyle & Crichton (1978) OI SK, R, I, SP
Doyle & Whitely (1974) OI SK, R
Elliott (1950) OC, OI SK, R, F, E
Ellis & RickaΓd (1977) OC, OI SK
Endo & Della-Piana (1976) OI SK, E
Frey(1973) OI SK, R, ST, D, E, SP
Frey(1976) OI SK, R, ST, D, E, SP
Frey, Leonard, & Beatty (1975) OI SK, R, ST, D, E, SP
Greenwood et al. (1976) OI ST, D
Grush& Costin (1975) OI SK
Hoffman (1978) OC, OI SK, I, E
Marsh, Fleiner, & Thomas (1975) OC, OI SK, R, ST, D, SP
Marsh & Overall (1980) OC, OI ST, D, I, E, SP
McKeachie, Lin, & Mann (1971) OI SK, R, ST, D, I, F
Mintzes(l977) OI SK, R, ST, D, I, F, SP
Morsh, Burgess, & Smith (1956) OI SK, R
Murdock(l969) OI
Rankin(l965) OC SP
Remmers, Martin, & Elliott (1949) OC, OI SK, F, E
Reynolds & Hansvick (1978) OI
Rodin & Rodin (1972) OI
Rubinstein & Mitchell (1970) OC, OI I
Solomon, Rosenberg, & Bezdek OI R, SP
(1964)
Sorge<fe Kline (1973) OI SK, R, ST, I
Spencer & Dick (1965) OI
Sullivan & Skanes (1974) OI
Turner & Thompson (1974) OI D
Wherry (1952) OI
Whitely & Doyle (1979) OI
Note. Rating designations are: OC = Overall Course, OI = Overall Instruc­
tor, SK = Skill, R = Rapport, ST = Structure, D = Difficulty, I = Interaction,
F = Feedback, E = Evaluation, SP = Student Progress.

289

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


PETER A. COHEN

analysis most studies reported a single effect size. Only 10 of the 41 studies provided
data for more than one multisection course. For these studies, averaging effect sizes
across multisection courses would provide greater independence among studies, but
at a potential loss of conceptual meaning. Therefore, the investigator chose to
calculate an effect size for each multisection course rather than for each paper.

Describing Study Characteristics


To characterize studies more precisely, 20 variables were defined. Table III lists
the 20 variables, the coding categories for each, and the number of multisection
courses in each category. The first 12 variables described methodological features of
the study. These variables covered both internal and external threats to validity
(Bracht & Glass, 1968; Campbell & Stanley, 1963). The next six variables character-
ized the ecological conditions under which the study took place. The final two
variables described publication features of the study. The 20 variables and criteria
for coding were as follows:
1. Assignment of students to teachers. Were students assigned to teachers randomly
or by administrative procedures? If students were administratively assigned, was
there evidence that sections were equivalent in terms of students' initial ability?
2. Control for scoring bias in the achievement criterion. Was an objective or
nonobjective (e.g., essay) test used to measure student achievement?
3. Controlfor author bias in the achievement criterion. Did the academic department
or the experimenter develop the achievement test, or was a standardized examination
used to measure student achievement?
4. Control for teacher bias in evaluating achievement. Did teachers grade the
achievement tests for their own class, or were tests graded by external evaluators?
5. Control for bias in the rating instrument. Were single-item ratings or teacher-
constructed rating items used to assess student attitudes toward instruction, or was a
standardized rating instrument with an underlying factor structure used?
6. Statistical control for initial student ability. Was the rating/achievement validity
coefficient based on the correlation between student ratings of instruction and the
raw achievement score, or was a part correlation computed between ratings and a
residualized achievement score?
7. Control for prior knowledge of the instructor. Did students know which teachers
were teaching different sections of a course prior to their enrollment in a specific
section?
8. Control for time at which ratings were administered. Were rating data collected
before or after students knew their final grade or examination score?
9. Length of instruction. Did the instruction occur for a complete academic term or
for less than a term?
10. Teacher autonomy. Were the teachers responsible for all classroom instruction
or only a component (e.g., discussion section) of instruction?
11. Number of sections. This variable showed the number of sections on which the
rating/achievement correlations were based.
12. Overall study quality. This variable was operationally defined by summing
scores from the other 11 methodological variables. Number of sections was recoded
to correspond to a three-point variable.
13. Content emphasis on "hard" discipline. Was the academic area concerned with

290

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


STUDENT RATINGS

TABLE III
Categories for Describing Studies and Number of Multisection Courses in Each Category
Coding Category Number of Courses
Methodological Features
Assignment of students to sections
No evidence of equivalence 47
Evidence of equivalence 7
Random assignment 14
Control for scoring bias in achievement criterion
Nonobjective test 9
Objective test 38
Control for author bias in achievement criterion
Departmental test 58
Standardized test 6
Control for bias in evaluating achievement
Tests graded by teacher 10
Tests graded by external evaluator 28
Control for rating instrument bias
Nonstandardized ratings 23
Standardized ratings 45
Statistical control for ability
No 43
Yes 25
Control for prior knowledge of instructor
No 26
Yes 6
Time at which ratings administered
After final grades 4
Before final grades 56
Length of instruction
Fraction of a semester 2
Whole semester 66
Teacher autonomy
Responsible for component of instruction 13
Responsible for all instruction 55
Number of sections
Less than 10 24
10-19 19
More than 19 25
Overall study quality
Low 7
Moderate 54
High 7
Ecological conditions
Content emphasis on "hard" discipline
"Soft" discipline 33
"Hard" discipline 35
Content emphasis on "pure" knowledge
Applied 7
Pure 61

291

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


PETER A. COHEN

TABLE III— Continued


Coding Category Number of Courses

Content emphasis on "life" studies


Nonlife 39
Life 29
Course level
Introductory 65
Other 3
Institutional setting
Comprehensive, liberal arts, or community college 23
Doctorate-granting institution 45
Instructor experience
Graduate students 22
Graduate students and faculty 4
Full-time faculty 41
Publication Features
Source of study
Unpublished 13
Published 55
Publication year
1940-1949 1
1950-1959 6
1960-1969 4
1970-1979 57

a single paradigm ("hard") or was it nonparadigmatic ("soft")? Biglan's (1973) three-


dimensional taxonomy of academic disciplines provided the basis for this classifica-
tion and the classification made in variables 14 and 15.
14. Content emphasis on "pure" knowledge. Was the academic area concerned with
pure knowledge or application?
15. Content emphasis on "life" studies. Was the academic area concerned with life
systems or nonlife systems?
16. Course level. Was the multisection course an introductory-level or more
advanced course?
17. Institution. Categorization of institutional level was based on the Carnegie
taxonomy of institutions of higher education (Carnegie Commission on Higher
Education, 1976). Institutions were coded as either doctorate-granting or other
institutions.
18. Instructor experience. Were the course instructors graduate students, faculty
members, or both graduate students and faculty members?
19. Source of study. Was the report published or unpublished?
20. Study year. This variable showed the year in which the report was documented.
The author coded all the studies. To check coding reliability, a graduate student
who was experienced in meta-analytic techniques was also trained to code study
features. A total of 21 of the 42 studies were randomly selected to be coded
independently by the trained rater. For the 20 study feature variables, agreement
coefficients ranged from .71 to 1.00, with a median coefficient of .94.

292

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


STUDENT RATINGS

Quantifying Study Outcomes

The next task in the meta-analysis was to describe quantitatively the outcomes of
the studies in the sample. The major outcomes of interest were: (1) the relationship
between student achievement and overall instructor rating; (2) the relationship
between student achievement and overall course rating; and (3) the relationship
between student achievement and different rating dimensions commonly found in
factor-analytic studies of student ratings.
The student achievement measure most commonly used in the studies was a
common final examination. In some cases a cumulative point total based on a
number of tests throughout the course was used. Final grades were only used as an
indicator of achievement if they were based strictly on objective achievement criteria
(e.g., total points derived from criteria such as exams, papers, lab reports). If the
article presented data for more than one achievement measure, final examination
scores were given preference, followed by total points, and then final grades.
Rating data were collected for both overall ratings and more specific rating
dimensions. The overall ratings were of two types: an overall instructor rating and an
overall course rating. Overall instructor rating data came from either a single rating
item concerning overall teaching effectiveness (e.g., "The instructor is an excellent
teacher") or from an average of all items or dimensions relating to the instructor's
effectiveness in a particular study. Overall course ratings were derived similarly.
Most commonly, a single rating item was used (e.g., "This is an excellent course").
Data were also collected for six dimensions of teaching. Kulik and McKeachie
(1975) identified four of these dimensions as "common" factors in their review of
factor-analytic studies of student ratings. These four dimensions are Skill, Rapport,
Structure, and Difficulty. The other two dimensions, Interaction and Feedback, were
described and interpreted by Isaacson et al. (1964). The six dimensions were defined
as follows:
1. Skill. The Skill dimension represents the overriding quality to which students
respond when rating instructors. Typical items are: "The instructor has a good
command of the subject matter." "The instructor gives clear explanations." "The
instructor teaches near the class level."
2. Rapport. The Rapport dimension includes items dealing with a teacher's em-
pathy, friendliness, approachability, and accessibility. Sample items are: "The instruc-
tor is friendly." "The instructor is permissive and flexible." "The instructor is
available to talk with students outside of class."
3. Structure. The Structure dimension describes how well the instructor planned
and organized the course. Typical items are: "The instructor has everything going
according to schedule." "The instructor uses class time well." "The instructor explains
course requirements."
4. Difficulty. The Difficulty dimension deals with the amount and difficulty of the
work the teacher expects of students. Typical items are: "The instructor assigned
difficult reading." "The instructor asked for more than students could get done."
"This course required more work than others of comparable credit hours."
5. Interaction. The Interaction dimension measures the degree to which students
are encouraged to share their ideas and become actively involved in class sessions.
Typical Interaction items are: "The instructor encourages students to express various

293

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


PETER A. COHEN

points of view." "The instructor encourages students to volunteer their own opinions."
"The instructor facilitates classroom discussion."
6. Feedback. The Feedback dimension measures the instructor's concern with the
quality of students' work. Standard items for this dimension are: "The instructor tells
students when they have done a particularly good job." "The instructor checks to see
if students have learned well before going on to new material." "The instructor keeps
students informed of their progress."
In addition to these six dimensions, data were collected on students' self-ratings of
their learning and student attitudes toward the subject being studied. If a study
presented results with other rating dimensions, these additional results were also
recorded.

Data Analysis
The basic measure of effect size was Pearson's product-moment correlation. For
each rating dimension, mean class achievement was correlated with mean class
rating. Procedures outlined by Glass (1978) were used to convert various summary
statistics (e.g., t values, F values, chi-squared values) into product-moment correla-
tions. The use of these algebraic transformations resulted in a greater number of
usable studies in the final sample. Before conducting statistical analyses, Fisher's z-
transformation was applied to all correlation coefficients based on procedures
suggested by Glass and Stanley (1970). After performing the appropriate analysis,
Fisher Z scores were transformed back into the more interpretable correlation
coefficients.
Two sets of analyses were performed on the data. The first set of analyses described
the overall size and significance of the rating/achievement correlations for the
different rating dimensions. The second set of analyses determined the effect of study
characteristics on the magnitude of the rating/achievement correlations using corre-
lational and multiple regression techniques.

Results
This section reports results of statistical analyses concerning the rating/achieve-
ment correlations. Findings are described in two areas: (a) overall effects and (b)
study characteristics and effect sizes.

Overall Effects
One of the major goals in meta-analysis is to reach overall conclusions about the
magnitude of effects. In this first set of analyses, descriptive statistics were used to
determine the overall size and significance of rating/achievement correlations for the
two general dimensions, seven specific teaching dimensions, and students' self-ratings
of their learning. The overall mean correlations, the number of multisection courses
on which the means are based, and the 95 percent confidence interval on the mean
population correlations are presented in Table IV.
Overall course rating. Correlations between an overall course rating and student
achievement were available for 22 of the 68 multisection courses located for this
meta-analysis. For 20 of the 22 courses, overall course rating was positively correlated
with student achievement; the correlation between overall course rating and student
achievement was negative in two courses. A total of 11 of the 22 correlations were

294

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


STUDENT RATINGS

TABLE IV
Mean Rating/Achievement Correlational Effect Sizes
Rating Dimension N Mean Correlation 95% Confidence Interval
Overall Course 22 0.47 0.09, 0.73
Overall Instructor 67 0.43 0.21,0.61
Skill 40 0.50 0.23, 0.70
Rapport 28 0.31 -0.07,0.61
Structure 27 0.47 0.11,0.72
Difficulty 24 -0.02 -0.42, 0.39
Interaction 14 0.22 -0.36, 0.67
Feedback 5 0.31 -0.79, 0.94
Evaluation 25 0.23 -0.18,0.58
Student Progress 14 0.47 -0.08, 0.80

statistically significant, and all 11 showed significant positive correlations. If no


overall generalization about the relationship between general course rating and
student achievement was possible, one would expect most of the correlations to fall
around zero, with about half of the correlations in the positive direction and half in
the negative direction. Instead, a positive correlation was reported for a clear majority
of courses. Therefore, the null hypothesis of no relationship between overall course
rating and student achievement was rejected.
A measure of effect size, the mean correlation, permits a more precise description
of the relationship between overall course rating and student achievement. The
average correlation for the 22 courses was .47. Cohen (1977) operationally defined
correlational effects of this magnitude (i.e., .50) as "large." With r = .50, for example,
25 percent of the variance of one variable is associated linearly with variance in the
other. Correlations between college entrance examinations and college grades are of
this magnitude (e.g., Mauger & Kolmodin, 1975). Cohen states that in the behavioral
sciences, when one "anticipates a degree of correlation between two different
variables about as high as they come, this would by our definition be a large effect,
r = .50" (Cohen, 1977, p. 81). When r = .30 the effect is said to be medium in size.
A correlation of this size implies that nine percent of the variance in one variable is
attributable to variance in the other variable. According to Cohen, "this degree of
relationship would be perceptible to the naked eye of a reasonably sensitive observer"
(p. 80). The relationship between intelligence test scores of nonrelated children reared
together is of this magnitude. When r .10, the effect is small. Only one percent of
the variance in one variable can be predicted by the variance of the other. According
to Cohen, relationships of this magnitude "would not be perceptible on the basis of
casual observation" (p. 79).
Although the correlation between overall course rating and student achievement
in the typical course was quite large, correlations varied from course to course. Figure
1 presents a distribution of correlations for the 22 courses. The figure shows that
nearly two-thirds of the multisection courses had moderately large positive correla-
tions.
Overall instructor rating. A total of 67 of the 68 courses provided data from which
correlations between an overall instructor rating and student achievement could be
derived. For 59 of the 67 courses, overall instructor rating correlated positively with

295

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


PETER A. COHEN

6 .

5
>-
o
z
σ
UJ

• \
2

1
\ \\
0 1 1 1 1 l/ 1 1 1 1 1 1 I \
¯ ¯ . 9 - . 7 - . 5 - . 3 ¯ ¯ . l +.1 +.3+.5+.7*.9
COURSE/ĤCHIEVEMENT CORRELATION
FIGURE 1. Distribution of Course/Achievement correlations for 22 courses.

student achievement; for eight courses the correlation was negative. For 31 courses
the correlation coefficient reached statistical significance, and in 30 of those courses
it was significantly positive. Under the null hypothesis of no relationship between
overall instructor rating and student achievement, these results are very unlikely.
The average correlation between overall instructor rating and student achievement
for the 67 multisection courses was .43, a moderately large effect. The 95 percent
confidence interval on the true population correlation ranged from .21 to .61. The
distribution of these 67 correlations is presented in Figure 2. Over half of the courses
had large positive correlations. Instructors whose students achieved the most were
also the ones who tended to receive the highest instructor ratings.
Skill Correlations between Skill ratings and student achievement were generated
for 40 courses. Skill was positively correlated with student achievement in 37 courses;
it was negatively correlated with achievement in three courses. For 20 of the 40
courses the correlation coefficient was statistically significant, and in all of these
courses Skill ratings and achievement were positively related. For the 40 courses, the
average correlation equalled .50, a large effect.

296

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


STUDENT RATINGS

26
24 L
22 L A
20 L \
18 L \
16
- \
zu L \
UJ ļ \
Ŝ12 L \
£io I \
8 I
6 \
U
Γ \
2 L
0 I 1 )S 1 Ni^^l I I I I I I
¯ . 9 - . 7 - . 5 - . 3 - . l +.1 + . 3 + . 5 + . 7 + . 9
INSTRUCTOR/RCHIEVEMENT CORRELATION
FIGURE 2. Distribution of Instructor/Achievement correlations for 67 courses.

Rapport. Correlations between Rapport ratings and student achievement were


available for 28 courses. In 23 courses Rapport was positively correlated to achieve-
ment; in four courses it was negatively correlated to achievement; and in one course
the correlation between Rapport and student achievement was zero. The correlation
was statistically significant for only six courses, and in each of these courses it was
significantly positive. The mean Rapport/achievement correlation for these 28
courses was .31, a moderate effect.
Structure. Correlations between Structure ratings and student achievement were
derived for 27 courses. In 24 courses Structure was positively correlated with
Achievement; it was negatively correlated with achievement in three courses. The
correlation was statistically significant in a positive direction in nine courses; in none
of the courses was the correlation significantly negative. The average correlation
between Structure and student achievement was .47 for the 27 courses.
Difficulty. Twenty-four of the 68 courses provided correlations between Course
Difficulty and student achievement. In 12 courses Difficulty was positively related to
achievement; in 10 courses it was negatively related to achievement; and in two
courses the correlation was zero. In only one course was the correlation between

297

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


PETER A. COHEN

Difficulty and achievement significant, and that was in a negative direction. The null
hypotheses of no relationship between Difficulty and student achievement could not
be rejected. The mean correlation between Difficulty and student achievement was
-.02.
Interaction. For 14 courses, correlations between Interaction and student achieve-
ment were available. In 12 courses Interaction was positively related to achievement;
in one course it was negatively related to achievement; and in one course the
correlation between Interaction and achievement was zero. The correlation was
significant in four courses, and in each case it was significantly positive. The mean
correlation between Interaction and student achievement was .22 for the 14 courses.
Feedback. Only five courses provided correlations between Feedback and student
achievement. The correlations were positive for all five courses. However, only one
course showed a statistically significant positive correlation. The mean correlation
between Feedback and student achievement was .23 for the five courses.
Evaluation. One other rating dimension, Evaluation, was correlated with student
achievement in a number of studies. The Evaluation dimension measures the extent
to which students feel the evaluation instruments (e.g., papers, examinations) fairly
assess their ability. Twenty-five studies reported correlations between Evaluation and
student achievement. In 20 courses Evaluation was positively correlated with achieve-
ment; in three courses it was negatively correlated with achievement; and in two
courses the correlation between Evaluation and achievement was zero. Only four
courses showed statistically significant correlations, and in each case Evaluation was
positively correlated with student achievement. The mean correlation between Eval-
uation and student achievement was .23 for the 25 studies.
Student progress. It was also of interest to determine how well students' self-ratings
of their learning corresponded with their achievement. Correlations between Student
Progress and achievement were available for 14 courses. In 10 courses Student
Progress was positively correlated with student achievement; in two courses it was
negatively correlated; and in two courses the correlation between Student Progress
and achievement was zero. The correlations were statistically significant in four
courses, and in each case Student Progress was positively correlated with achieve-
ment. The mean correlation between Student Progress and student achievement was
.47 for the 14 courses.
Summary of overall effects. We can be relatively certain that the general course
and instructor dimensions relate quite strongly to student achievement. For both of
these dimensions, the mean rating/achievement correlational effect size is moderately
large, and the 95 percent confidence intervals around the true population means do
not span zero. This magnitude of effect size does not hold up for all teaching
dimensions, however. While large effect sizes are found for the Skill and Structure
dimensions, other dimensions such as Rapport, Interaction, Feedback, and Evalua-
tion show more modest effects. The Course Difficulty dimension shows no relation-
ship with student achievement. Finally, students' self-ratings of their learning corre-
late quite highly with student achievement.

Study Characteristics and Effect Sizes


Another major goal in meta-analysis is to explain the variation in effects among
the different studies. To accomplish this, correlational and multiple regression

298

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


STUDENT RATINGS

analyses were conducted to determine whether studies that reported large effect sizes
differed systematically from those which produced small effect sizes. As a first step
in this set of analyses, zero-order correlations were computed between the 20 study
characteristic variables and the rating/achievement effect sizes. Then, to investigate
the possibility that a combination of variables might predict effect sizes more
accurately than a single predictor, a hierarchical multiple regression analysis (Cohen
& Cohen, 1975) was conducted. The hierarchical model requires the analyst to
specify in advance the order in which the independent variables enter the regression
equation. The model determines the partial correlation coefficients of each inde-
pendent variable at the point where the variable enters the equation, while also
indicating the cumulative R2. Thus, the hierarchical procedure shows the unique
contribution of a specific independent variable to the total variance of the dependent
variable, when previously entered independent variables have been partialled.
The investigator selected the hierarchical multiple regression strategy for two
reasons. First, an examination of the correlation matrix showed that many of the
study characteristic variables were substantially intercorrelated. This problem of
multicollinearity is best dealt with by an ordered variance partitioning procedure
(Cohen & Cohen, 1975). Second, the hierarchical model is most appropriate when
independent variables can be ordered with regard to their causal priority. For the
present meta-analysis, the independent variables in the regression came from the set
of 19 variables used to describe characteristics in the sample of multisection courses.
The dependent variable in the regression analysis was the rating/achievement
correlation for the overall instructor rating dimension. Study characteristics that have
been hypothesized by other researchers to influence the magnitude of the rating/
achievement correlation were initially entered into the regression model. Following
this set of variables, the remaining study characteristics were entered. This resulted
in the following hierarchical ordering: Set A (knowledge of instructor, student
assignment, instructor experience, instructor autonomy, number of sections, control
for ability, timing of ratings, rating instrument bias); Set B (hard science, pure
knowledge, life studies, author bias, scoring bias, evaluation bias); and Set C
(institution, study year, source of study). The overall study quality variable was not
used in the regression analysis because it was based on other entered variables. In
addition, two study characteristic variables—course level and length of instruction—
could not explain the variation in rating/achievement correlations because there was
little variation on these study characteristics, and therefore, they were not used in the
regression analysis.
The hierarchical multiple regression procedure identified which independent vari-
ables significantly contributed to the variance of the overall instructor rating/achieve-
ment correlations. These significant variables were then entered into a separate
multiple regression equation. From this regression analysis, a prediction equation
and percent of total variance accounted for were computed.
The correlations between study characteristics and overall instructor rating/
achievement effect sizes for 67 courses are presented in Table V. Three variables
correlated significantly with effect size: control for bias in evaluating achievement;
time at which the ratings were administered; and instructor experience. The results
of the hierarchical regression analysis showed that only these three variables contrib-
uted significantly to the variance in effect sizes. Together, the three study character-

299

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


PETER A. COHEN

istic variables accounted for 31 percent of the variance in overall instructor rating/
achievement correlations. The regression model including these variables produced
the following equation (t values given in parentheses):
1.183 + .088 (instructor experience)
(1.58)
— .853 (timing of ratings) + .419 (evaluation bias).
(3.96) (2.72)
This model shows that for graduate student instructors, the correlations between
overall instructor rating and student achievement averaged .34, while for full-time
faculty the correlation was .48. In terms of the time at which the ratings were
administered, the average correlation was much higher when students knew their
final grades (.85) than when they did not know their final grades (.38). When
achievement tests were graded by students' own instructors, the correlation between
overall instructor ratings and achievement was .15. The correlation averaged .52
when an external grader was used.

Discussion
The present meta-analysis provides strong support for the validity of student
ratings as measures of teaching effectiveness. Teachers whose students do well on
achievement measures receive higher instructional ratings than teachers whose

TABLE V
Correlations of Study Characteristics With Overall Instructor/Achievement Effect Sizes
(N = 67)
Study Characteristic Correlation with Effect Size
Assignment of students to sections -0.04
Control for scoring bias in achievement criterion 0.12
Control for author bias in achievement criterion 0.12
Control for bias in evaluating achievement 0.29*
Control for rating instrument bias 0.15
Statistical control for ability 0.05
Control for prior knowledge of instructor 0.03
Time at which ratings administered —0.43**
Length of instruction -0.16
Teacher autonomy 0.12
Number of sections —0.14
Overall study quality -0.04
Content emphasis on hard discipline —0.01
Content emphasis on pure knowledge 0.06
Content emphasis on life studies -0.06
Course level 0.13
Institutional setting -0.06
Instructor experience 0.25*
Source of study —0.04
Publication year 0.10
* p < 0.05
** p < 0.001

300

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


STUDENT RATINGS

students do poorly. The findings presented here reinforce conclusions reached by


earlier reviewers. The consensus of those earlier reviews was that there is a low to
moderate correlation between student ratings of instruction and student achievement.
This study demonstrates that the relationship between ratings and achievement is
slightly stronger and more consistent than was previously thought.
Investigators in this area have used the voting method for reviewing research and
drawing overall generalizations. Because the number of studies cited in these reviews
was relatively small, each individual study carried a significant weight in determining
the reviewer's conclusions. Thus, the more frequently cited studies have strongly
influenced the generalizations made concerning this literature. The investigation
cited most often by reviewers, the Rodin and Rodin (1972) study, shows a strong
negative correlation between ratings and achievement. Because of its controversial
findings, and because it was published in a prestigious journal (Science), this study
has received an enormous amount of attention. Although reviewers often question
the internal validity of the Rodin and Rodin study, it is likely that this study also
serves to temper their enthusiasm for the strength of the rating/achievement rela-
tionship.
The present meta-analysis included 68 multisection courses located in 41 inde-
pendent validity studies. This is more than three times the number of independent
studies in the most inclusive review (Marsh, 1980), and twice the composite number
of studies cited in all 11 reviews. With this size sample of studies, overall conclusions
are not likely to be colored by individual studies. For example, an examination of
Figure 2 shows that the Rodin and Rodin study is one of two "outlier" studies that
reported a large negative correlation between ratings and achievement. We can be
relatively certain that Rodin and Rodin's result is not representative of the true
relationship.

Overall Effects
The use of meta-analytic techniques also makes it possible to reach more exact
conclusions about the rating/achievement relationship. The first set of analyses in
the present investigation reported on the overall size and significance of rating/
achievement effects. Of prime importance was the correlation between the overall
instructor rating and student achievement. For 67 multisection courses the correlation
averaged .43, a moderately large effect.The magnitude of this correlation is probably
about as high as can be expected considering the restricted range of both mean
achievement scores and mean instructor ratings among different sections of a course.
Thus, in the typical study there was a strong tendency for students to rate most highly
teachers from whom they learned most.
In addition to determining the overall rating/achievement relationship, the meta-
analysis also focused on the degree to which the more specific instructional rating
dimensions related to student achievement. The obtained results suggest that certain
aspects of teaching, as measured by student ratings, are more related to learning than
are others. Correlational effect sizes for both the Skill and Structure dimensions were
large, .50 and .47, respectively. It is not surprising that Skill ratings, which measure
teacher's instructional competence, correspond well with student achievement. We
would expect that the more skilled instructors facilitate greater learning in their
students than instructors who are less adept. Perhaps not as evident is the strong

301

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


PETER A. COHEN

relationship between Structure and achievement. Students of instructors who have


everything going according to schedule, use class time well, explain course require-
ments, and in general have the class well organized tend to learn more than students
of instructors who are not well organized.
Other rating dimensions did not relate as well to student achievement. The
Rapport, Feedback, and Interaction ratings all deal with types of interpersonal
contact between students and the instructor. Although in the positive direction,
rating/achievement correlations for these dimensions were more modest than those
for the Skill and Structure dimensions. Interpersonal aspects of teaching do not seem
to be as important for student learning as the factors of instructor competence and
course organization. One might speculate that this is due to the information-oriented
nature of most multisection, introductory-level courses. Another major instructional
dimension, Course Difficulty, did not relate at all to student achievement. The
amount students learned did not depend on the relative difficulty of the particular
section in which they were enrolled.
The second set of analyses reported on the relationship between study character-
istics and study outcomes. Here, it was of interest to determine if differences in effect
sizes were related to potential biasing factors in study design, or if effects varied for
different study conditions. Three categories were used to classify study characteristic
variables: (1) methodological features of the study, (2) ecological conditions under
which the study took place, and (3) publication features of the study.

Methodological Features and Study Outcomes


One methodological feature that influenced the size of the rating/achievement
correlation was the time at which the ratings were administered. In a few studies
ratings were obtained from students after they learned about their final grades or
examination scores. In these studies rating/achievement correlations were very high.
For most of the studies in the sample, ratings were obtained from students before
they received grade information. These studies reported smaller rating/achievement
correlations. It seems likely, therefore, that students are influenced to a certain extent
by knowledge of their grades. In this situation, as Feldman (1976) points out, the
interpretation of this relationship becomes confounded. High positive correlations
may show that good teachers are receiving high ratings, but, on the other hand, they
may indicate that by giving good grades teachers can buy good evaluations. In
judging rating validity, therefore, we should place more confidence in the results of
those studies where students rated instructors before they knew their grades. For
these studies the average rating/achievement correlation still indicated a moderately
large effect.
The only other methodological variable that correlated significantly with effect
size was evaluation bias. The correlation between ratings and achievement was large
when an external grader was used or when each instructor evaluated one part of the
test for all students. For 10 studies in which achievement tests were evaluated by
students' own instructors, the effect size was much smaller. In each of these studies,
the individual instructors had final control over assigned achievement scores. Thus,
there existed the possibility of inconsistencies in grading practices and procedures
among instructors. Such an uncontrolled extraneous factor could potentially influence
the accuracy of reported achievement scores and consequently attenuate the rating/
achievement correlation.
302

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


STUDENT RATINGS

Of particular interest in the present study were methodological features that other
investigators have hypothesized affect the rating/achievement relationship. For
instance, Leventhal (1975) maintained that random assignment of students is neces-
sary to be able to attribute differences in student achievement to different teachers.
When students self-select into sections with knowledge of teachers' reputations, the
relationship between ratings and achievement may become confounded. The meta-
analysis showed that studies in which students were randomly assigned to sections
produced findings no different from those of studies that did not control enrollment
procedures. Furthermore, whether or not students knew prior to enrollment which
instructors were teaching the course did not relate to effect size. Although random
assignment is preferable in multisection validity designs, it is often difficult to achieve
under the constraints of student registration procedures. Therefore, most studies in
this area have not randomly assigned students to sections. The present findings
suggest, though, that student section selection factors do not contribute to any
systematic bias in generalizing an overall rating/achievement effect.
Whether or not researchers statistically controlled for initial differences in student
ability did not affect the magnitude of the rating/achievement correlation. This result
supports the preliminary findings of Kulik and Kulik (1974). Based on nine inde-
pendent studies they calculated a median for both adjusted (part or partial) correla-
tions and unadjusted (raw) correlations. The median adjusted correlation found in
these studies was .27; the median unadjusted correlation was .23. Although the
present meta-analysis found a larger overall effect size than did the Kuliks, there still
was little difference between adjusted or unadjusted rating/achievement correlations.
The studies in the sample employed a variety of rating instruments, scales, and
individual items. Marsh and Overall (1980) have maintained that differences in
rating/achievement correlations may be due to the lack of well-defined factor
structures in most of the rating scales used. The present findings do not support this
speculation. First of all, nearly three-quarters of the studies used some sort of
standardized ratings. Only 11 of 41 studies used single-item or experimenter-con-
structed ratings. More importantly, there was no difference in the size of correlational
effects between studies using standardized ratings and those using unstandardized
ratings.
Some reviewers have been concerned that rating/achievement correlations vary
according to the number of sections used in the study. In the present meta-analysis,
the number of sections on which correlations were based ranged from five to 121.
The relationship between number of sections and effect size was nonlinear. Actually,
number of sections correlated significantly with the absolute value of effect size;
studies using small numbers of sections tended to report either large positive cr large
negative correlational effects. This supports the conclusions of other reviewers (e.g.,
Kulik & McKeachie, 1975; Marsh & Overall, 1980; Vecchio, 1980) that results from
studies using small numbers of sections are quite variable and difficult to interpret.
In the present instance, when including only studies that used 20 or more sections,
the average correlation between overall instructor rating and achievement was .37.
This compares quite favorably to the average effect computed over all studies.
The other methodological variables had no effect on the size of the rating/
achievement correlation. Studies using objective achievement examinations produced
results similar to those of studies using essay tests. Whether a departmental or a
standardized examination measured student achievement did not make a difference
303

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


PETER A. COHEN

in effect sizes. Nor did the degree of teacher autonomy affect study outcomes. It did
not matter whether teachers were responsible for all instruction or only a component
of instruction. Finally, overall study quality, which was based on a composite of all
methodological variables, did not significantly relate to the magnitude of correlational
effect sizes.
The methodological features discussed above have implications for the design and
inteφretation of multisection validity studies. The meta-analysis showed that it is
important to set controls for certain extraneous influences. For instance, we should
be cautious in generalizing from results of studies that have not controlled for factors
such as the timing of ratings and evaluation bias. We should also place more
confidence in results of studies that use an adequate number of sections. Not
controlling for other potential extraneous factors such as student section selection,
rating instrument bias, and differences in initial student ability did not seem to
threaten the external validity of the sample of studies included in the present meta-
analysis.

Ecological Conditions and Study Outcomes


A second set of \ariables was constructed to classify ecological features of the
studies. These variables helped determine whether effect sizes varied according to
the conditions under which the study took place. Only one of these variables,
instructor experience, related to size of correlational effect. Previous investigators
(e.g., Sullivan & Skanes, 1974; Whitely & Doyle, 1979) have found large differences
between rating/achievement validity coefficients generated from samples of full-time
faculty and graduate student instructors. They described large correlations between
ratings and achievement for full-time faculty; correlations were near zero for graduate
student instructors. Explanations for these varying effects have included differences
in: instructional experience (Kulik & Kulik, 1974; Seibert, 1979); the "psychological
distance" between students and instructors (Doyle & Crichton, 1978); commitment
to teaching (Sullivan & Skanes, 1974); and level of responsibility for course operation
(Gessner, 1973). In the present analysis, there was a tendency for rating/achievement
correlations to be slightly higher when full-time faculty, as opposed to graduate
students, were the teachers of the course. While validity coefficients for full-time
faculty were large, however, those for graduate student instructors also demonstrated
a moderate relationship between ratings and achievement. It is not likely that
differences in teacher autonomy explain this finding. As indicated earlier, the meta-
analysis showed that effect sizes were similar in courses where the instructor was
responsible for all instruction or responsible for only a component of instruction.
Other study conditions did not affect the size of the rating/achievement correla­
tions. Correlational effects were similar for courses taught at different types of
institutions. The same effects also emerged for different subject matter areas. Findings
were similar in "hard" and "soft" disciplines, in pure and applied areas, in life studies
and other content areas. And finally, because almost all multisection courses were
taught from an introductory perspective, the effect of course level could not be
determined.
Publication Features and Study Outcomes
Two variables were created to describe publication features of the studies. Neither
of these variables related to the magnitude of the rating/achievement correlations.

304

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


STUDENT RATINGS

Although year of publication ranged over a 30-year period, most of the studies had
been conducted within the last decade. More recent studies did not produce effects
greatly different from those of earlier studies. Nor was publication source associated
with effect size. Studies published in journals and unpublished studies reported
similar findings.

Summary and Conclusions


The meta-analysis presents findings that are not evident in other reviews of the
rating/achievement relationship. For instance, from the meta-analysis we can deter-
mine which areas in this literature are in need of further study. Few studies have
been conducted on other than introductory-level courses. The overall relationship
between ratings and achievement may be different in more advanced courses.
Perhaps teaching aspects measured by interpersonal dimensions (e.g., Rapport,
Interaction, Feedback) are more likely to relate to outcome criteria other than student
learning, such as student interest in the subject matter, enrollment in advanced
courses, and so forth. Questions such as these are difficult to answer at the present
time and merit future study. There are also few if any multisection studies on the
relationship between ratings and student retention. As McKeachie (1979) points out,
students will often make up for a teacher's deficiencies by extra studying. Thus
differences in achievement due to variation in teaching quality may be washed out.
A measure of how much students have retained at some later time may be a better
index of learning against which to validate ratings.
Meta-analysis also determines the generalizability of findings from one setting to
another. In the present study, each muUisectioned course was treated as the unit of
analysis. Correlational effect sizes for the courses were remarkably stable under a
variety of study conditions and methodological manipulations. We can be quite
confident that the relationship between ratings and achievement described in this
meta-analysis is characterized by what Bracht and Glass (1968) term "external
validity." That is, the present findings can be generalized to different students,
instructors, institutions, and subject matter areas.
The meta-analysis cannot, however, resolve questions concerning the internal
validity of the sample of studies. It is difficult to determine in any one study, let
alone the entire set of studies, the extent to which achievement differences among
sections can be attributed to differences among teachers. What the meta-analysis
does show is that study findings are not affected much by methodological controls.
In general, studies that seek to reduce threats to internal validity produce outcomes
no different from those that do not.
Based on the findings of the meta-analysis, we can safely say that student ratings
of instruction are a valid index of instructional effectiveness. Students do a pretty
good job of distinguishing among teachers on the basis of how much they have
learned. Thus, the present study lends support to the use of ratings as one component
in the evaluation of teaching effectiveness. Both administrators and faculty should
feel secure that to some extent ratings reflect an instructor's impact on students.

References
Bendig, A. W. The relation of level of course achievement to students' instructor course ratings
in introductory psychology. Educational and Psychological Measurement, 1953, 13, 437-448.
(a)

305

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


PETER A. COHEN

Bendig, A. W. Student achievement in introductory psychology and student ratings of the


competence and empathy of their instructors. Journal of Psychology, 1953, 36, 427-433. (b)
Benton, S. E., & Scott, O. A comparison of the criterion validity of two types of student response
inventories for appraising instruction. Paper presented at the annual meeting of the National
Council on Measurement in Education, San Francisco, April 1976. (ERIC Document
Reproduction Service No. ED 128 397)
Biglan, A. The characteristics of subject matter in different academic areas. Journal of Applied
Psychology, 1973, 58, 195-203.
Bolton, B., Bonge, D., & Marr, J. Ratings of instruction, examination performance, and
subsequent enrollment in psychology courses. Teaching of Psychology, 1979, 6, 82-85.
Borg, W. R., & Hamilton, E. R. Comparison between a performance test and criteria of
instructor effectiveness. Psychological Reports, 1956, 2, 111-116.
Bracht, G. H., & Glass, G. V The external validity of experiments. American Educational
Research Journal, 1968, 5, 437-474.
Braskamp, L. A., Caulley, D., & Costin, F. Student ratings and instructor self-ratings and their
relationship to student achievement. American Educational Research Journal, 1979, 16, 295-
306.
Bryson, R. Teacher evaluations and student learning: A reexamination. The Journal of Educa-
tional Research, 1974, 68, 12-14.
Campbell, D. T. Recommendations for APA test standards regarding construct, trait, and
discriminant validity. American Psychologist, 1960, 75, 546-553.
Campbell, D. T., & Stanley, J. C. Experimental and quasi-experimental designs for research on
teaching. In N. L. Gage (Ed.), Handbook of research on teaching. Chicago: Rand-McNally,
1963.
Carnegie Commission on Higher Education. A classification of institutions of higher education
(Rev. ed.). Berkeley, Calif: The Carnegie Foundation for the Advancement of Teaching,
1976.
Centra, J. A. Student ratings of instruction and their relationship to student learning. American
Educational Research Journal, 1977, 14, 17-24.
Centra, J. A. Determining faculty effectiveness. San Francisco: Jossey-Bass, 1979.
Chandler, T. A. The questionable status of student evaluations of teaching. Teaching of
Psychology, 1978,5, 150-152.
Chase, C. I., & Keene, J. M. Validity of student ratings of faculty. Bloomington, lnd.: Bureau of
Educational Studies and Testing, Indiana University, 1979. (ERIC Document Reproduction
Service No. ED 169 870)
Cohen, J. Statistical power analysis for the behavioral sciences (Rev. ed.). New York: Academic
Press, 1977.
Cohen, J., & Cohen, P. Applied multiple regression/correlation analysisfor the behavioral sciences.
Hillsdale, N.J.: Lawrence Erlbaum, 1975.
Cohen, P. A. Effectiveness of student-rating feedback for improving college instruction: A
meta-analysis of findings. Research in Higher Education, 1980, 13, 321-341. (a)
Cohen, P. A. A meta-analysis of the relationship between student ratings of instruction and student
achievement. (Doctoral dissertation, University of Michigan, 1980). Dissertation Abstracts
International, 1980, 4l(5-A), 2012. (University Microfilms No. 8025666) (b)
Cohen, P. A., Ebeling, B. J., & Kulik, J. A. A meta-analysis of outcome studies of visual-based
instruction. Educational Communication and Technology Journal, 1981, 29, 26-36.
Cohen, S. H., & Berger, W. G. Dimensions of students' ratings of college instructors underlying
subsequent achievement on course examinations. Proceedings of the 78th Annual Convention
of the American Psychological Association, 1970, 5, 605-606. (Summary)
Costin, F. Do student ratings of college teachers predict student achievement? Teaching of
Psychology, 1978, 5, 86-88.

306

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


STUDENT RATINGS

Costin, F., Greenough, W. T., & Menges, R. J. Student ratings of college teaching: Reliability,
validity, and usefulness. Review of Educational Research, 1971, 41, 511-535.
Crooks, T. J., & Smock, H. R. Student ratings of instructors related to student achievement.
Urbana, 111.: Office of Instructional Resources, University of Illinois, 1974.
Doyle, K. O. Student evaluation of instruction. Lexington, Mass.: D. C. Heath, 1975.
Doyle, K. O., & Crichton, L. I. Student, peer, and self-evaluations of college instructors. Journal
of Educational Psychology, 1978, 70, 815-826.
Doyle, K. O., & Whitely, S. E. Student ratings as criteria for effective teaching. American
Educational Research Journal, 1974, 11, 259-274.
Elliott, D. N. Characteristics and relationships of various criteria of college and university
teaching. Purdue University Studies in Higher Education, 1950, 70, 5-61.
Ellis, N. R., & Rickard, H. C. Evaluating the teaching of introductory psychology. Teaching of
Psychology, 1977, 4, 128-132.
Endo, G. T., & Della-Piana, G. A validation study of course evaluation ratings. Improving
College and University Teaching, 1976, 24, 84-86.
Feldman, K. A. Grades and college students' evaluations of their courses and teachers. Research
in Higher Education, 1976, < 69-111.
Follman, J. Student ratings and student achievement. JSAS Catalog of Selected Documents in
Psychology, 1974, 4, 136. (Ms. No. 791)
Frey, P. W. Student ratings of teaching: Validity of several rating factors. Science, 1973, 182,
83-85.
Frey, P. W. Validity of student instructional ratings: Does timing matter? Journal of Higher
Education, 1976, 47, 327-336.
Frey, P. W., Leonard, D. W., & Beatty, W. W. Student ratings of instruction: Validation
research. American Educational Research Journal, 1975, 12, 435-447.
Gage, N. L. Students' ratings of college teaching: Their justification and proper use. In N. S.
Glasman & B. R. Killait (Eds.), Second UCSB Conference of Effective Teaching. Santa
Barbara, Calif: Graduate School of Education and Office of Instructional Development,
University of California, Santa Barbara, 1974.
Gessner, P. K. Evaluation of instruction. Science, 1973, 180, 566-570.
Glass, G. V Primary, secondary, and meta-analysis of research. Educational Researcher, 1976,
5, 3-8.
Glass, G. V Integrating findings: The meta-analysis of research. In L. S. Shulman (Ed.), Review
of research in education (Vol. 5). Itasca, 111.: F. E. Peacock, 1978.
Glass, G. V, & Smith, M. L. Meta-analysis of research on class size and achievement.
Educational Evaluation and Policy Analysis, 1979, 1, 2-16.
Glass, G. V, & Stanley, J. C. Statistical methods in education and psychology. Englewood Cliffs,
N. J.: Prentice-Hall, 1970.
Greenwood, G. E. et al. A study of the validity of four types of student ratings of college
teaching assessed on a criterion of student achievement gains. Research in Higher Education,
1976,5, 171-178.
Grush, J. E., & Costin, F. The student as consumer of the teaching process. American
Educational Research Journal, 1975, 12, 55-66.
Guide to DIALOG searching. Palo Alto, Calif: Lockheed DIALOG Information Retrieval
Service, Lockheed Missiles & Space Company, 1979.
Hall, J. A. Gender effects in decoding non-verbal cues. Psychological Bulletin, 1978, 85, 845-
857.
Hoffman, R. G. Variables affecting university student ratings of instructor behavior. American
Educational Research Journal, 1978, 15, 287-299.
Isaacson, R. L. et al. Dimensions of student evaluations of teaching. Journal of Educational
Psychology, 1964, 55, 344-351.

307

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


PETER A. COHEN

Kulik, C-L., Kulik, J. A., & Cohen, P. A. Instructional technology and college teaching.
Teaching of Psychology, 1980, 7, 199-205.
Kulik, J. A., Cohen, P. A., & Ebeling, B. J. Effectiveness of programmed instruction in higher
education: A meta-analysis of findings. Educational Evaluation and Policy Analysis, 1980,
2(6), 51-64.
Kulik, J. A., & Kulik, C-L. C Student ratings of instruction. Teaching of Psychology, 1974, I,
51-57.
Kulik, J. A., Kulik, C-L. C , & Cohen, P. A. A meta-analysis of outcome studies of Keller's
personalized system of instruction. American Psychologist, 1979, 34, 307-318. (a)
Kulik, J. A., Kulik, C-L. C , & Cohen, P. A. Research on audio-tutorial instruction: A meta-
analysis of comparative studies. Research in Higher Education, 1979, 11, 321-341. (b)
Kulik, J. A., Kulik, C-L. C , & Cohen, P. A. Effectiveness of computer-based college teaching:
A meta-analysis of findings. Review of Educational Research, 1980, 50, 525-544.
Kulik, J. A., & McKeachie, W. J. The evaluation of teachers in higher education. In F. N.
Kerlinger (Ed.), Review of research in education, (Vol. 3). Itasca, 111.: Peacock, 1975.
Leventhal, L. Teacher rating forms: Critique and reformulation of previous validation designs.
Canadian Psychological Review, 1975, 16, 269-276.
Leventhal, L., Abrami, P., & Perry, R. Bogus evidence for the validity of student ratings. Paper
presented at the annual meeting of the American Psychological Association, San Francisco,
August 1977. (ERIC Document Reproduction Service No. ED 150 510)
Leventhal, L. et al. Section selection in multi-section courses: Implications for the validation
and use of teacher rating forms. Educational and Psychological Measurement, 1975, 35, 885-
895.
Marsh, H. W. Research on students' evaluations of teaching effectiveness: A reply to Vecchio.
Instructional Evaluation, 1980, 4(2), 5-13.
Marsh, H. W., Fleiner, J., & Thomas, C S. Validity and usefulness of student evaluations of
instructional quality. Journal of Educational Psychology, 1975, 67, 833-839.
Marsh, H. W., & Overall, J. U. Validity of students' evaluations of teaching effectiveness:
Cognitive and affective criteria. Journal of Educational Psychology, 1980, 72, 468-475.
Mauger, P. A., & Kolmodin, C A. Long-term predictive validity of the Scholastic Aptitude
Test. Journal of Educational Psychology, 1975, 67, 847-851.
McKeachie, W. J. Student ratings of faculty: A reprise. Academe, 1979, 65, 384-397.
McKeachie, W. J., Lin, Y-G., & Mann, W. Student ratings of teacher effectiveness: Validity
studies. American Educational Research Journal, 1971, 8, 435-445.
Mintzes, J. J. Field test and validation of a teaching evaluation instrument: The Student Opinion
of Teaching. Windsor, Ontario: University of Windsor, 1977. (ERIC Document Reproduction
Service No. ED 146 185)
Morsh, J. E., Burgess, G. G., & Smith, P. N. Student achievement as a measure of instructor
effectiveness. Journal of Educational Psychology, 1956, 47, 79-88.
Murdock, R. P. The effect of student ratings of their instructor on the student's achievement and
rating. Salt Lake City, Ut.: University of Utah, 1969. (ERIC Document Reproduction Service
No. ED 034 715)
Peterson, P. L. Direct instruction reconsidered. In P. L. Peterson & H. J. Walberg (Eds.),
Research on teaching. Berkeley, Calif.: McCutchan, 1979.
Rankin, E. F., Greenmum, R., & Tracy, R. J. Factors related to student evaluations of a college
reading course. Journal of Reading, 1965, 9, 10-15.
Remmers, H. H., Martin, F. D., & Elliott, D. N. Are students' ratings of instructors related to
their grades? Purdue University Studies in Higher Education, 1949, 66, 17-26.
Reynolds, D. V., & Hansvick, C Graduate instructors who grade higher receive lower evaluations
by students. Paper presented at the annual meeting of the American Psychological Association,
Toronto, Ontario, September 1978.
Rodin, M., & Rodin, B. Student evaluations of teachers. Science, 1972, 777, 1,164-1,166.

308

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016


STUDENT RATINGS

Rosenthal, R. Experimenter effects in behavioral research. New York: Irvington, 1976.


Rubinstein, J., & Mitchell, H. Feeling free, student involvement, and appreciation. Proceedings
of the 78th Convention of the American Psychological Association, 1970, 5, 623-624. (Summary)
Seibert, W. F. Student evaluations of instruction. In S. C. Ericksen (Ed.), Support for teaching
at major universities. Ann Arbor, Mich.: University of Michigan, Center for Research on
Learning and Teaching, 1979.
Seldin, P. Successful faculty evaluation programs. Crugers, New York: Coventry Press, 1980.
Sheehan, D. S. On the invalidity of student ratings for administrative personnel decisions.
Journal of Higher Education, 1975, 46, 687-700.
Smith, M. L., & Glass, G. V Meta-analysis of psychotherapy outcome studies. American
Psychologist, 1977, 32, 752-760.
Solomon, D., Rosenberg, L., & Bezdek, W. E. Teacher behavior and student learning. Journal
of Educational Psychology, 1964, 55, 23-30.
Sorge, D. H., & Kline, C E. Verbal behavior of college instructors and attendant effect upon
student attitudes and achievement. College Student Journal, 1973, 7(4), 24-29.
Spencer, R. E., & Dick, W. Course evaluation questionnaire: Manual of interpretation (Research
Report No. 200). Urbana, 111.: Office of Instructional Resources, University of Illinois, 1965.
Sullivan, A. M., & Skanes, G. R. Validity of student evaluation of teaching and the character-
istics of successful instructors. Journal of Educational Psychology, 1974, 66, 584-590.
Turner, R. L., & Thompson, R. P. Relationships between college student ratings of instructors and
residual learning. Paper presented at the annual meeting of the American Educational
Research Association, Chicago, April 1974.
Vecchio, R. P. Student ratings of instructors: Should we take them seriously? Instructional
Evaluation, 1980, 4(2), 1-4.
Ware, J. E., & Williams, R. G. The Dr. Fox effect: A study of lecture effectiveness and ratings
of instruction. Journal of Medical Education, 1975, 50, 149-156.
Wherry, R. J. Control of bias in ratings (PRS Reports 914, 915, 919, 920, and 921). Washington,
D.C.: Department of the Army, The Adjutant General's Office, 1952.
Whitely, S. E., & Doyle, K. O. Validity and generalizability of student ratings from between-
classes and within-class data. Journal of Educational Psychology, 1979, 71, 117-124.

AUTHOR
PETER A. COHEN, Assistant Director OISER, Adjunct Assistant Professor of
Psychology, Dartmouth College, Webster Hall, Hanover, NH 03755. Specializa-
tion: Instructional evaluation; research on college teaching; research synthesis.

309

Downloaded from http://rer.aera.net at LAURENTIAN UNIV LIBRARY on June 5, 2016

You might also like