Recommendations for Use of Student Ratings of Instructors in Merit and Promotion Decisions1
Student course ratings have many uses, particularly if viewed over time and across courses.2 Student ratings provide information that instructors can use to identify areas of strength and areas needing improvement in their teaching.3 Students can also use course ratings to help them make enrollment choices.4 Furthermore, departments and teaching units can use student ratings in the aggregate to assess the overall performance of multi-course and multi-instructor units, as well as to evaluate individual instructors for personnel reasons, such as decisions regarding retention, promotion, tenure and merit pay.
The nine recommendations listed below provide guidelines for the use of student course ratings in personnel decisions.
- Student ratings must be used in concert with other data that relate to the quality of a faculty members teaching, rather than as a sole indicator of teaching quality. Other sources such as peer reviews of classroom sessions, peer reviews of curricular materials, and faculty self-reflection should be assessed in addition to student evaluations to gain a true sense of the teaching skills and performance of a faculty member. Consideration of these other sources of evidence is especially important because student ratings alone do not provide sufficient evidence of the extent of student learning in a course.
- Evaluations from more than a single section should be used in making any decision about teaching quality. Research has shown that ratings from at least five courses are necessary to assure adequate reliability. The validity of the ratings for measuring teaching quality is increased as a greater variety of course formats is represented in the data upon which decisions are based. Trends in ratings across years may also be important in assessing teaching.
- Only global ratings of teaching effectiveness, as represented by the first four items on Instructional Assessment forms, should be used in personnel decisions. Other, more specific items should be used by the faculty member for review of specific skills and areas for improvement.
- Small differences in individual evaluations should not be used as a basis for differential decisions. Because student ratings yield numerical averages, there is a temptation to overestimate the precision of the averages that are presented. Small differences in ratings may not be meaningful. It is better to deal with much broader classifications, such as Excellent vs. Very Good.
- Interpretations of student ratings averages should be guided by awareness that, in a university that uses teaching excellence as a hiring criterion, teaching is typically at a high level. Also, students tend to rate faculty at or near the high end of the scale. Only about 5% of University of Washington faculty receive average ratings below the scale point labeled "Good." It is therefore not appropriate to use the median (or 50th percentile) as a presumed dividing line between strong and weak teachers. More appropriate would be to assume that the great majority of teachers are strong. It is also appropriate, when evaluating average ratings of individual instructors, to consider relevant comparisons (see Recommendation 6) and specific characteristics of courses taught (see Recommendation 7).
- Comparative data should be used but with caution. Colleges and departments should compile information about evaluations for faculty in their courses over time for comparative purposes. However, for comparisons to be useful, the normative group should be based on more than a narrow population of instructors. Smaller departments may not want to rely on departmental norms but use norms calculated for a number of similar departments or for the school or college as a whole. At times, it may be better to compare ratings of similar courses across departments rather than ratings of dissimilar courses within departments.
- Course characteristics should be considered when interpreting results. For example, large lecture courses typically receive lower ratings than smaller courses, new courses being taught for the first time receive lower ratings than well-established courses, introductory courses for non-majors receive lower ratings than higher division courses for majors, and courses in departments that have high student workloads typically receive lower ratings. Adjustments for course type should be made in order to have a fairer sense of the faculty member's teaching skills. One way to adjust for course types is by choosing similar courses for normative comparisons.
- Faculty members should be given an opportunity to respond to evaluation results. Faculty should have an opportunity to discuss the objectives of the course, how the teaching methods were used to meet that objective, and how circumstances in the course might have affected evaluations. Furthermore, other evaluation information gained from a given course (see Recommendation 1) can aid with the interpretation of ratings results.
- Administration of course ratings should be scheduled to maximize the number of respondents. Generally, evaluations will have greater validity when higher proportions of the enrolled students complete evaluation forms. Ratings may not be an accurate reflection of the entire class when smaller proportions of students respond. This problem can be particularly acute in small classes. Generally, a minimum of a third of enrolled students must be included in the results to have any confidence in the results. As proportions decrease, particularly in small classes, there is greater opportunity for the rating of one or a few students to disproportionately affect the results.
1 These recommendations were passed by the University of Washington Faculty Council on Instructional Quality (FCIQ) and approved by the Faculty Senate Executive Committee, Spring quarter, 2003. Footnotes were not part of the original document; they are added here to assist University of Washington faculty and administrators.
2 National Research Council (2003). Evaluating and Improving Undergraduate Teaching in Science, Technology, Engineering, and Mathematics. The National Academies Press: Washington, D.C.
Lewis, K.G. (Ed.). (2001). Techniques and strategies for interpreting student evaluations. New Directions for Teaching and Learning, 87.
Braskamp, L.A. and Ory, J.C. (1994). Assessing faculty work: Enhancing individual and institutional performance. San Francisco: Josey-Bass.
3 Consultation services at UW Seattle are provided by the Center for Instructional Development and Research.
4 Ratings of courses at UW Seattle may be viewed at Course Evaluation Catalog.