Problems in the pipeline: Stereotype threat and women's achievement in high-level math courses,☆☆

https://doi.org/10.1016/j.appdev.2007.10.004Get rights and content

Abstract

It is well established that negative stereotypes can undermine women's performance on mathematics tests. Despite considerable laboratory evidence for the role of “stereotype threat” in girls' and women's math test performance, the relevance of such findings for the “real world” gender test-score gap remains unclear and debates about causes focus primarily on innate sex differences in cognitive capacity. Reported here are results of a field experiment that tested the usefulness of the stereotype threat formulation for understanding women's performance in upper levels of college mathematics — men and women who are highly motivated and proficient mathematicians and who are in the pipeline to mathematics and science professions. Our primary hypothesis was confirmed. Test performance of women in a stereotype-nullifying presentation of the test in an experimental group was raised significantly to surpass that of the men in the course. In a control group, in which test-takers were given the test under normal test instructions, women and men performed equally. The pattern of results suggests that even among the most highly qualified and persistent women in college mathematics, stereotype threat suppresses test performance.

Introduction

How much of the gender gap in math and science achievement is attributable to nature and how much to nurture is a source of concern, debate, and political thin ice for many who have a stake in understanding the issue of why boys and girls appear to differ in mathematical ability. Although some studies have found evidence for biological differences (e.g., Halpern, 1992), there remains considerable debate on the extent to which biology explains the differential outcomes in math achievement (Benbow and Stanley, 1980, Eccles et al., 1984, Gottfredson, 2002, Pinker, 2005, Spelke, 2005, Stipek and Gralinski, 1991). The depth and political charge of this debate was underscored recently when then-president of Harvard University, Lawrence Summers, suggested during a speech (Summers, 2005) that one of the primary reasons men vastly outnumber women at the upper rungs of science and mathematics professions is biological differences that result in “different availability of aptitude at the high end.” In other words, he argued that women are not significantly hindered merely by “social forces”; rather their limitation is mostly attributable to innate biological differences that make them less capable than men of doing high-level mathematics.

In this paper, we present data that cast doubt on the “different availability of aptitude at the high end” hypothesis. Specifically, we report data from a field study examining the relevance of stereotype threat – an environmental impediment to women's math performance – to the test-score gap between men and women in a context that is at the epicenter of the nature-nurture debate — in the advanced mathematics courses at a major university. Such courses are important because they are the “pipelines” to hard-science careers. For students to enter the fields where women are traditionally under-represented, they must succeed in these courses. The design of the experiment – comparing a group of men and women for whom the stereotype is nullified with a control group who receives the same test under standard testing procedures – permits us to evaluate whether stereotype threat is a significant factor in the test-score gap among these students. Our data show that there are significant (albeit not equal) numbers of talented women in the pipeline to hard-science professions. Moreover, the data strongly suggest that not only are women able to achieve grades that are on par with men's grades in the most difficult mathematics courses, but once disruptive social forces are minimized, women can even surpass men on difficult math tests.

The gap in mathematics performance that has rekindled the nature-nurture debate can be documented at practically every level of schooling, especially when looking at the highest levels of achievers (National Science Foundation, 2006). For example, a higher percentage of males than females perform at or above proficiency in mathematics in the fourth (35% versus 30%), eighth (30% versus 27%), and twelfth (19% versus 14%) grades. Moreover, although the gap between males and females on the SAT has shrunk over the years, males still outperform females by 34 points on this high-stakes test (College Board, 2005). What is more, these early differences in performance may lay the foundation for future career aspirations. For example, although women earn nearly half of all science and engineering doctoral degrees, they are vastly under-represented in the so-called “hard” sciences. Specifically, in 2003 women earned over half of the doctoral degrees in the social and behavioral sciences, yet they earned only 29% of doctoral degrees in the physical sciences, 24% of doctoral degrees in mathematics, and only 19% of doctoral degrees in engineering (National Science Foundation, 2006).

What explains these sex-based performance and participation differences? Although biological factors may represent a potent factor (see Halpern, 1992 for a review), the role of sociocultural forces, such as sex stereotypes, has also been well established (Eccles, 1994, Eccles and Jacobs, 1992, Eccles et al., 1990, Good et al., in preparation-a, Good et al., in preparation-b; see Aronson and Steele, 2005, Steele et al., 2002 for reviews). For example, one line of research suggests that parents are important socializers of sex-based achievement differences in mathematics (Eccles, 1994, Eccles and Jacobs, 1992, Eccles et al., 1990). Expectancies set early in childhood by parents may lay the foundation for later underperformance, interest, and participation during adolescence.

Another growing line of research suggests that girls and women suffer negative performance outcomes on math tests, not necessarily because they are socialized by parents to lack ability, but because of their vulnerability to negative stereotypes disseminated in the broader culture. The research shows that when stereotypes are not activated, or if they are nullified by other cues in the environment, girls and women perform better. When negative stereotypes are activated and left unchecked, they trigger a number of disruptive psychological processes that undermine test performance (Croizet et al., 2004, Dar-Nimrod and Heine, 2006, Davies et al., 2002, Good et al., 2003, Inzlicht and Ben-Zeev, 2000, Johns et al., 2005, McGlone and Aronson, 2006, McIntyre et al., 2003, Schmader, 2002, Schmader and Johns, 2003, Spencer et al., 1999). This phenomenon is called “stereotype threat” and has been applied to the underperformance of African-Americans, Latinos, and a variety of other minority groups (for reviews see Aronson and Steele, 2005, Steele et al., 2002).

This research has offered a better understanding of the situations that are most likely to lead to underperformance (Good et al., in preparation-a, Good et al., in preparation-b, Inzlicht and Ben-Zeev, 2000, Spencer et al., 1999), the age at which females are likely to experience stereotype threat (Good & Aronson, in preparation), and methods of overcoming stereotype threat (Aronson et al., 2002, Cohen et al., 2006, Good et al., 2003, Johns et al., 2005). For example, researchers have found that certain situations, such as taking a diagnostic test in a domain in which one's group faces negative stereotypes, is sufficient to invoke underperformance on the test (Steele & Aronson, 1995). In addition, anything that makes one's social identity salient – such as indicating one's race prior to taking a diagnostic verbal test – can also lead to stereotype-based underperformance (Steele & Aronson, 1995). Even the sex composition of the testing situation can influence vulnerability to stereotype threat: researchers have found that women's performance on a math test went down as a function of the number of men taking the test in the same room (Inzlicht & Ben-Zeev, 2000). Studies such as these clearly implicate both the social situation and the temporary mindset of the performer in the underperformance of stereotyped groups, such as women in math.

Yet despite considerable empirical support, stereotype threat is not widely accepted as a factor in the mathematics gender gap. One reason for this, we believe, is that much of the research showing that stereotype threat undermines test performance has been conducted in psychology laboratories. As such, results can easily be discounted as irrelevant to test-score gaps existing in the real world. For example, Sackett and his colleagues (Sackett, Hardison, & Cullen, 2004) recently argued that stereotype threat experiments conducted by Steele and Aronson (1995) – which demonstrated large effects of stereotypes on the test scores of African-Americans – were interesting but irrelevant to the test-score gap. In essence, this argument suggests that the experiments have shown that an existing gap can be made larger by inducing stereotype threat but that the studies fail to show that reducing stereotype threat in the real world of the classroom or testing center would narrow the actual gap between black and white students.

Scientific evidence at odds with this argument is accumulating, demonstrating specifically that reducing stereotype threat significantly narrows the gap between stereotyped and non-stereotyped students. For example, a large study conducted by Stricker and Ward (2004) for the Educational Testing Service (ETS) found that simply moving the standard demographic inquiry about test-taker gender to the end of the test (and thus reducing the level of stereotype threat during the test) resulted in significantly higher performance among women who took the AP calculus test. The results showed that if ETS made official this simple change in procedure, 4700 additional female students would receive Advanced Placement credit in calculus each year. Perhaps attesting to the politicized nature of research in this area, Stricker and Ward presented their data as indicating no significant effect of asking about gender on women's performance. Yet in fact, a reanalysis of the ETS data (Danaher & Crandall, in press) showed these effects to be substantial and significant. Other studies have shown the relevance of stereotype threat by showing that interventions that reduce stereotype threat also produce significant reductions of performance gaps relative to control groups (Brown and Day, 2006, Cohen et al., 2006, McGlone and Aronson, 2006). This strongly suggests the role of stereotype threat in extant gaps between men and women in mathematics and minorities and whites in all areas of cognitive ability.

Research investigating the contribution of stereotype threat to women's mathematics performance has focused, for the most part, on women in college-level mathematics (see Steele et al., 2002, for a review), leaving unanswered the question of when stereotype-based underperformance begins to emerge. The few studies that have tackled this question suggest that stereotyped individuals may be vulnerable to stereotype threat as early as elementary school. For example, African-American and Latino children as young as six years old performed more poorly on a cognitive task when that task was described as a measure of ability than when the same task was described as a problem-solving task (McKown & Weinstein, 2003). However, this difference emerged only for those children who were aware of broadly held stereotypes about academic ability. Moreover, race/ethnicity-based stereotypes and sex-based stereotypes do not necessarily operate in the same fashion. For example, stereotype threat has been found to undermine girls' math performance in the sixth grade, but not in the fourth or fifth grades (Good & Aronson, in preparation). Regardless of the specific age or grade level at which negative stereotypes were found to affect performance, these studies illustrate that stereotype threat begins to undermine performance in children's elementary years. And by late adolescence, its cumulative effects can undermine students' career plans (Good et al., in preparation-a), social identities (Good, Dweck, & Aronson, 2007), and performance on high-stakes evaluations.

In recent years, some researchers have turned their attention toward finding methods of alleviating stereotype threat. One line of research has addressed the underlying message of the stereotype — that stereotyped individuals are inherently limited because of their group membership. This approach has encouraged stereotyped individuals to reject the idea that intelligence is a fixed trait and instead to adopt the mindset that intelligence is a quality that can be increased with hard work and effort. In one study, African-American college students who were exposed to this view received higher grade point averages and later reported that academic achievement was more important to their senses of selves (Aronson, Fried, & Good, 2002). In a related study, college students mentored middle school students and helped them create web-pages advocating the view that intelligence is a malleable capacity (Good, Aronson, & Inzlicht, 2003). At the end of the year, the middle school females who received the malleable message earned higher scores on the state-wide standardized test in math than their female peers who instead received an anti-drug message as part of the intervention. In a similarly remarkable study, Cohen, Garcia and Master (2006) reduced the black–white GPA gap among low-income middle school students by affirming the students' self-concepts (and presumably inoculating them from stereotype threat) at the beginning of the school term.

These techniques, although effective, are not likely to be implemented in standard testing situations such as the SAT, GRE, etc. Furthermore, they are unlikely to be adopted by most mathematics curricula. Thus another aim of this research was to test whether or not stereotype threat could be reduced in a manner that would be easily replicated and implemented in everyday school experiences.

In sum, although stereotype threat has been found to undermine the performance of females in mathematics as early as the sixth grade, most studies investigating the effects of stereotypes on females' math achievement have focused on college-aged women. Moreover, many of these studies have concentrated primarily on females' performance on general mathematics tasks. Although most studies recruit participants who are highly identified with mathematics, or who have performed well on mathematics tasks in the past (for example, Spencer et al., 1999), none have tested whether or not stereotype threat suppresses the scores of females who are enrolled in the most difficult mathematics courses that lay the foundation for future careers in math and science. Consequently, we need to obtain information about those in the pipeline who aspire to be the women who would fill the void described by Summers. For example, Spencer and his colleagues (Spencer et al., 1999) used the psychology subject pool for recruitment of study participants. Although these participants also were screened for high math achievement (for example, earning a B or better in calculus), the population likely included females who had no intention of pursuing a math or math-based major or career. Consequently, these past studies left unaddressed the issue of the role of stereotypes and stereotype threat in math performance in the upper end of the math ability distribution. Most importantly, however, the experiments were conducted in the laboratory and thus their applicability to real-world test-score gaps can be questioned. In sum, the goal of the present study was to test whether the negative effects of stereotype threat extend beyond the laboratory and beyond women's mathematics performance in general to women's underperformance in the most difficult college mathematics courses — those that produce future mathematicians, engineers, and scientists, and those that conceivably might be avoided by women vulnerable to stereotype threat.

Section snippets

Participants and design

Participants were 174 calculus students enrolled in the final course of the most rigorous and fast-paced calculus sequence offered by the university, a large public university in the southwest. This course satisfied degree requirements for mathematics, engineering, and many of the natural sciences such as biology, chemistry, geology, and physics. Because students who enrolled in the course had already successfully navigated the previous semester's “gatekeeper” course, they were well on their

Manipulation check

Because two participants failed to answer the question about whether or not the test was biased, these data were analyzed for 155 participants. The two-way ANOVA performed on the participants' responses to this question revealed only an effect of condition, F(1, 151) = 3.94, p = .05, Cohen's D = .34, r = .17. This represents a medium effect size. Participants in the non-stereotype threat condition (M = 1.51, SD = 1.38) reported a significantly lower level of test bias than participants in the

Discussion

This study revealed that the effects of stereotype threat are not limited to the typical woman's performance on general mathematics tests. Rather, even women at the upper ends of the ability distribution in college who opt to enroll in the most difficult math courses can be vulnerable to the effects of negative stereotypes. Specifically, framing a difficult calculus test as diagnostic of ability suppressed women's calculus performance. However, ensuring women that the same diagnostic test was

References (49)

  • BenbowC.P. et al.

    Sex differences in mathematical ability: Fact or artifact

    Science

    (1980)
  • BridgemanB. et al.

    Gender differences in predictors of college mathematics performance and in college mathematics course grades

    Journal of Educational Psychology

    (1991)
  • BrownR.P. et al.

    The difference isn't black and white: Stereotype threat and the race gap on Raven's advanced progressive matrices

    Journal of Applied Psychology

    (2006)
  • CohenG. et al.

    Reducing the racial achievement gap: A social–psychological intervention

    Science

    (2006)
  • College Board.

    2005 College-bound seniors: Total group profile report

    (2005)
  • CroizetJ.-C. et al.

    Stereotype threat undermines intellectual performance by triggering a disruptive mental load

    Personality and Social Psychology Bulletin

    (2004)
  • Danaher, K. & Crandall, C.S. (in press). Stereotype threat in applied settings re-examined. Journal of Applied Social...
  • Dar-NimrodI. et al.

    Exposure to scientific theories affects women's math performance

    Science

    (2006)
  • DaviesP.G. et al.

    Consuming images: How television commercials that elicit stereotype threat can restrain women academically and professionally

    Personality & Social Psychology Bulletin

    (2002)
  • EcclesJ.S. et al.

    Sex differences in achievement: A test of alternate theories

    Journal of Personality and Social Psychology

    (1984)
  • EcclesJ.S.

    Understanding women's educational and occupational choices: Applying the Eccles et al. model of achievement related choices

    Psychology of Women Quarterly

    (1994)
  • EcclesJ.S. et al.

    Social forces shape math attitudes and performance

    Signs

    (1986)
  • EcclesJ.S. et al.

    The impact of mothers' gender-role stereotypic beliefs on mothers' and children's' ability perceptions

    Journal of Personality and Social Psychology

    (1992)
  • EcclesJ.S. et al.

    Gender role stereotypes, expectancy effects and parents' socialization of gender differences

    Journal of Social Issues

    (1990)
  • Cited by (251)

    View all citing articles on Scopus

    Preparation of this paper was supported by a grant from the National Science Foundation (BCS-0217251) to Catherine Good and Carol S. Dweck, and by a National Science Foundation CAREER award (BCS-9875957) to Joshua Aronson.

    ☆☆

    We thank the professors and teaching assistants of the calculus classes for allowing us to work with their students. We especially thank Gwenn Sherburne for her invaluable assistance in this regard. We also acknowledge support from New York University's Center for Research on Culture, Development and Education. Portions of this work were presented at the Annual Conference of the Midwestern Psychological Association, in Chicago, IL.

    View full text