It’s no secret that women still get the shaft in the work world. It will take 118 years before the gender pay gap closes, says the World Economic Forum’s latest Global Gender Gap Report. And even though women outnumber men in the workforce, only 4 percent of CEOs in the Standard & Poor’s 500 firms are female, according to a 2015 Catalyst study. Gender bias is still alive and well, with any number of factors working against a woman’s career advancement.
And according to recent UC Berkeley research, student teaching reviews are one of them.
Two separate studies indicate that student evaluations are inherently sexist—and that relying on them in employment decision can jeopardize female career progress in academia.
Philip B. Stark, associate dean of the Berkeley Division of Mathematical and Physical Sciences, and graduate student Kellie Ottoboni ran an in-depth statistical analysis of two different data sets from U.S. and French universities, along with research fellow Anne Boring of OFCE-Sciences Po. Their conclusion: Student evaluators judge women instructors more harshly overall than men, rendering the evaluations a greater reflection of gender bias than teaching ability.
In the French experiment, students were assigned to male or female instructors, and when they were asked to evaluate them, male students (although not, on average, the female students) rated male instructors higher overall. Could it have been that male teachers were actually better at their jobs? Sure. But the evidence suggested otherwise: On average, male students who took classes with male instructors did worse on the final than those with female instructors. And at this particular university, all students take the same final exam, which is graded anonymously. In short, even though male students perceived their male teachers to be more effective, in fact, students taught by females showed better mastery of the material and earned higher grades.
The American experiment was designed differently, and involved only online classes. Students had been randomly assigned a male or female instructor, and at the end of the course, were asked to give their instructors an overall score, and to rank them based on care, respectfulness, enthusiasm, promptness, professionalism, knowledge, clarity, and the like.
But the students were misled about the gender of their online instructors, whom they could not see or hear. In two of the course sections, male instructors used female names, and female instructors used male names. Both male and female instructors covered the same material and returned all assignments at the same time.
The results: in a twist on the French experience, this time it was U.S. female students that gave better ratings to instructors they thought were male—despite the fact that the male and female instructors’ performances were essentially the same.
“Students are not good judges of whether or not they have learned a lot. Or at least—when they’re asked how effective the instructor is, they’re answering some other question, no matter what they intend to do,” says Stark, a prominent critic of how student evaluations are used in academia. “We shouldn’t rely on student evaluations to measure teaching effectiveness. And if we do, not only are we not measuring teaching effectiveness, but we’re disadvantaging women.”
And this disadvantage can stretch out over the course of a woman’s entire teaching career.
Recently Haas School of Business professors Jennifer Chatman and Laura Kray analyzed differences in student evaluations of 76 tenure-track university faculty (19 female, 57 male) over the course of their careers. (They are withholding the name of the university, other than to say it is a top-ranked business school.) Like Stark, they found that men generally get the highest scores.
But they also discovered that females receive lower student ratings than their male counterparts at what Kray and Chatman refer to as a “critical time” in the women’s careers (when they are just starting out, and they have the highest levels of productivity). These lower ratings could lead to women being refused career advancements such as promotions and tenure. They also found that female instructors lose positive perceptions of competence at a steeper rate than men do as they age. When students gave descriptive feedback about instructors, they used more words overall for women’s evaluations, and also came down harder on women when they weren’t conforming to the stereotype of being “supportive,” “interpersonal” or “warm.” Men who weren’t “warm” weren’t ranked as negatively.
“Women need to simultaneously fulfill the stereotype to be warm and compassionate with the students, and at the same time be an absolutely complete, unambiguous rock star,” Chatman says. “Their warmth behavior is really a pivotal expectation among the students, where it’s completely irrelevant for men. It doesn’t matter if the man is warm or not. If he’s intelligent and a good teacher, he’s gonna get high ratings.”
The upshot? “People who set up these evaluation systems need to know that student reports of faculty teaching are biased,” says Chatman, “and they ought to have other methods of doing it.”
In an email response to an inquiry about the studies’ findings, UC Berkeley spokesperson Janet Gilmore wrote: “My understanding is that the work has not yet gone through a pre-publication, peer-review process, but it raises some important issues…. Our campus guidelines strongly encourage the use of multiple modes of assessment. This may help to guard against any implicit biases that might be at work in any one source of assessment.”
Scholarly studies have argued in favor of student evaluations since the 1970s. Many of the studies, however, don’t explore the presence of gender bias or address the issue of whether it should be considered when making judgments about instructors.
And according to the University of California’s Academic Personnel Manual, a committee charged with judging an instructor’s effectiveness is not supposed to consider only student evaluations. Employment decisions are also to be based on a range of factors such as the instructor’s level, enrollments, new courses taught or old courses taught that received recognition, awards, and evaluation by other faculty members.
So, all things considered, this may seem fair. But are all things always considered?
Stark says he’s seen “a number of personnel cases” in which the only evidence about teaching considered was student evaluations, a few non-random student comments, a list of courses taught, a list of graduate advising responsibilities and a statement by the faculty member being assessed. “I have heard of a grievance by a lecturer who alleged s/he was fired only because his/her student evaluations were below departmental averages,” Stark writes. “And I have heard of a faculty member initially denied tenure because of [student evaluations], but who was able to argue successfully that student evaluations did not reflect his/her teaching effectiveness.”
Chatman says that the goal of her research is to persuasively demonstrate that stereotyping women (particularly older women) is a prevalent problem, and that it’s something we must address on a broader, societal level—because sexism isn’t dead.
“Things have been changing, but slowly,” Chatman says. “The first step is to get a grip on the existence of the issue—its magnitude, which is huge—and figure out how to help women as a result.”
In light of these new studies, does UC Berkeley plan to reevaluate its use of student evaluations? Vice Provost for Faculty Janet Broughton says the University of California may want to consider revising the personnel manual guidelines as research on this subject continues—but revisions would be made only after consultation between the UC Academic Senate and UC administration system-wide.