Baixe o app para aproveitar ainda mais
Prévia do material em texto
Psychological Bulk-tin 1990, Vol. 107, No. 2,13! Copyright 1990 by the American Psychological Association, Inc. OQ33-2909/90/S00.75 Gender Differences in Mathematics Performance: A Meta-Analysis Janet Shibley Hyde, Elizabeth Fennema, and Susan J. Lamon University of Wisconsin—Madison Reviewers have consistently concluded that males perform better on mathematics tests than females do. To make a refined assessment of the magnitude of gender differences in mathematics perfor- mance, we performed a meta-analysis of 100 studies. They yielded 254 independent effect sizes, representing the testing of 3,175,188 Ss. Averaged over all effect sizes based on samples of the general population, d was -0.05, indicating that females outperformed males by only a negligible amount. For computation, (/was -0.14 (the negative value indicating superior performance by females). For understanding of mathematical concepts, rfwas —0.03; for complex problem solving, d was 0.08. An examination of age trends indicated that girls showed a slight superiority in computation in elemen- tary school and middle school. There were no gender differences in problem solving in elementary or middle school; differences favoring men emerged in high school (d = 0.29) and in college (d = 0.32). Gender differences were smallest and actually favored females in samples of the general popu- lation, grew larger with increasingly selective samples, and were largest for highly selected samples and samples of highly precocious persons. The magnitude of the gender difference has declined over the years; for studies published in 197 3 or earlier d was 0.31, whereas it was 0.14 for studies published in 1974 or later. We conclude that gender differences in mathematics performance are small. None- theless, the lower performance of women in problem solving that is evident in high school requires attention. During the past 15 years, there has been much concern about women and mathematics. Since Lucy Sells (1973) identified mathematics as the "critical filter" that prevented many women from having access to higher paying, prestigious occupations, there has been much rhetoric and many investigations focused on gender differences in mathematics performance. Particularly within the fields of psychology and education, gender differences in mathematics performance have been stud- ied intensively, and there has been some consensus on the pat- tern of differences. Anastasi (1958), in her classic differential psychology test, stated that although differences in numerical aptitude favored boys, these differences did not appear until well into the elementary school years. Furthermore, she stated that if gender differences in computation did appear, they fa- vored females, whereas males excelled on tests of numerical reasoning. Concurring with this, Maccoby and Jacklin (1974) concluded that one of four sex differences that "were fairly well established" was that "boys excel in mathematical ability" (p. 352). They also noted that there were few sex differences until about ages 12-13, when boys' "mathematical skills increase faster than girls' " (p. 352). This research was supported by National Science Foundation Grant MDR 8709533. The opinions expressed are our own and not those of the National Science Foundation. We thank Marilyn Ryan for her assistance in conducting the meta- analysis. We thank researchers at the Educational Testing Service, espe- cially Carol Dwyer and Eldon Park, for their help in providing Educa- tional Testing Service data. Correspondence concerning this article should be addressed to Janet Shibley Hyde, Department of Psychology, Brogden Psychology Build- ing, University of Wisconsin, Madison, Wisconsin 53706. Most recently, Halpern (1986) concluded that "the finding that males outperform females in tests of quantitative or mathe- matical ability is robust" (p. 57). She stated that the differences emerge reliably between 13-16 years of age. The literature in education has reported conclusions that are basically in agreement with the psychological literature. In 1974, Fennema reviewed published studies and concluded that No significant differences between boys' and girls' mathematics achievement were found before boys and girls entered elementary school or during early elementary years. In upper elementary and early high school years significant differences were not always ap- parent. However, when significant differences did appear they were more apt to be in the boys' favor when higher-level cognitive tasks were being measured and in the girls' favor when lower-level cogni- tive tasks were being measured. (Fennema, 1974, pp. 136-137) In the Fennema review, no conclusions were made about high school learners because of the scarcity of studies of subjects of that age. However, a few years later, Fennema and Carpenter (1981) reported that the National Assessment of Educational Progress showed that there were gender differences in high school, with males outperforming females, particularly in high cognitive-level tasks. This conclusion has been reported by each succeeding National Assessment (Meyer, in press). Stage, Kreinberg, Eccles, and Becker (1985), in a thorough review of the major studies that had been reported up to 1985, concluded that The following results are fairly consistent across studies using a variety of achievement tests: I) high school boys perform a little better than high school girls on tests of mathematical reasoning (primarily solving word problems); 2) boys and girls perform sim- ilarly on tests of algebra and basic mathematical knowledge; and 3) girls occasionally outperform boys on tests of computational skills. . . . Among normal populations, achievement differences favoring 139 140 J. HYDE, E. FENNEMA, AND S. LAMON boys do not emerge with any consistency prior to the 10th grade, are typically not very large, and are not universally found, even in advanced high school populations. There is some evidence, how- ever, that the general pattern of sex differences may emerge some- what earlier among gifted and talented students, (p. 240) Thus, although there are some variations, there is a consensus that, overall, gender differences in mathematics performance have existed in the past and are still present. Global conclusions tend to assert simply that males outperform females on mathe- matics tests. More refined discussions generally conclude that the overall differences in mathematics performance are not ap- parent in early childhood; they appear in adolescence and usu- ally favor boys in tasks involving high cognitive complexity (problem solving) and favor girls in tasks of less complexity (computation). Theoretical Models of Gender and Mathematics Performance Theoretical models concerning gender and mathematics per- formance generally begin with the assumption that males out- perform females in mathematics. The models are designed to explain the causes of that phenomenon. For example, Eccles and her colleagues (e.g., Eccles, 1987; Meece, [Eccles] Parsons, Kaczala, Goff, & Futterman, 1982) have built an Expectation X Value model to explain differential selection of mathematics courses in high school. Fennema and Peterson (1985) proposed an autonomous learning behavior model that suggested that failure to participate in independent learning in mathematics contributes to the development of gender differences in mathe- matics performance. Others have proposed biological theories focusing, for example, on brain lateralization (reviewed by Halpern, 1986). This model building may be premature because the basic phenomenon that the models seek to explain—the gender difference in mathematics performance—is in need of reassess- ment, using the modern tools of meta-analysis. Meta-Analysis and Psychological Gender Differences The reviews cited previously haveall used the method of nar- rative review. That is, the reviewers located studies of gender differences, organized them in some fashion, and reported their conclusions in narrative form. The narrative review, however, has been criticized on several grounds: It is nonquantitative, un- systematic, and subjective, and the task of reviewing 100 or more studies simply exceeds the human mind's information- processing capacity (Hunter, Schmidt, & Jackson, 1982). Meta-analysis has been denned as the application of "quanti- tative methods to combining evidence from different studies" (Hedges & Olkin, 1985, p. 13). In the 1980s, meta-analysis be- gan to make important contributions to the literature on psy- chological gender differences (e.g., Hyde & Linn, 1986). Hyde (1981) performed a meta-analysis on the 16 studies of quantita- tive ability of subjects aged 12 or older that were included in Maccoby and Jacklin's (1974) review (12 being the age at which Maccoby and Jacklin concluded that the sexes begin to diverge in mathematics performance). Hyde found a median effect size of .43 and noted that this difference was not as large as one might have expected given the widely held view that the differ- ence is well established. The Hyde (1981) meta-analysis included only studies re- ported through 1973, and thus there is a need to update it with recent research. Furthermore, the median value of rfwas com- puted on the basis of only seven values. In addition, statistical methods have advanced considerably since the time of the Hyde review. Hedges and his colleagues have developed homogeneity statistics that allow one to determine whether a group of studies is uniform in its outcomes (Hedges & Olkin, 1985; Rosenthal & Rubin, 1982a). Applied to the topic of gender differences in mathematics performance, these statistical techniques allow one to determine whether the magnitude of the gender differ- ence varies according to the cognitive level of the task, the age group, and so on. Thus, modern techniques of meta-analysis can answer considerably more sophisticated questions than could the earlier meta-analyses and certainly more than could earlier narrative reviews. Current Study We performed a meta-analysis of studies of gender differences in mathematics performance. Our goal was to provide answers to the following questions: 1. What is the magnitude of gender differences in mathemat- ics performance, using the d metric? We were chiefly interested in answering this question for the general population. However, we also provide analyses for selective samples. 2. Does the magnitude or direction of the gender difference vary as a function of the cognitive level of the task? 3. Does the magnitude or direction of the gender difference vary as a function of the mathematics content of the test (arith- metic, geometry, algebra, and so on)? 4. Developmentally, at what ages do gender differences ap- pear or disappear, and for what cognitive levels? 5. Are there variations across ethnic groups in the magnitude or direction of the gender difference? 6. Does the magnitude of the gender difference vary depend- ing on the selectivity of the sample, whether the sample is of the general population or of a population that is selected for high performance? 7. Has the magnitude of gender differences in mathematics performance increased or declined over the years? Method Sample of Studies The sample of studies came from seven sources: (a) a computerized data base search of PsyclNFO for the years 1967-1987, using the key terms human-sex-differences crossed with (mathematics or mathemat- ics-concepts or mathematics-achievement or standardized tests}, which yielded 198 citations; (b) a computerized data base search of ERIC, using the key terms sex-differences crossed with (mathematics or math- ematics achievement or mathematics-tests), which yielded 435 cita- tions; (c) inspection of all articles in Journal fur Research in Mathemat- ics Education and Educational Studies in Mathematics: (d) the bibliog- raphy of Maccoby and Jacklin (1974); (e) the bibliography of Fennema (1974); (f) norming data from widely used standardized tests; and (g) state assessments of mathematics performance. In the case of the computerized literature searches, abstracts were GENDER DIFFERENCES IN MATHEMATICS PERFORMANCE 141 printed for each citation. The abstracts were inspected, and citations that did not promise to yield relevant data (e.g., review articles or non- empirical articles) were excluded. All relevant articles were photocop- ied. Doctoral dissertations were obtained through interlibrary loan and were then inspected for the data necessary to compute effect sizes. Only studies reporting psychometricaily developed mathematics tests were included. Specifically, we excluded studies using Piagetian mea- sures (e.g., the concept of conservation of number) because they assess a much different construct than do standardized tests. Grades, too, were excluded because they may measure a different construct, and because they are assigned more subjectively and may therefore be more subject to bias than are standardized tests. (See Kimball, 1989, for a review of gender differences in classroom grades; girls consistently outperform boys in mathematics grades.) If an article appeared to have relevant data but the data were not presented in a form that permitted computation of an effect size, a letter was sent to the author at the address specified for reprints or at a more recent address found in the American Psychological Association Mem- bership Register or the American Educational Research Association Di- rectory. Large-sample, normative data were obtained for the following widely used tests: American College Testing Program test (ACT), Graduate Management Admissions Test (GMAT), Scholastic Aptitude Test (SAT- Q), SAT Mathematics Level I and Level 2, Differential Aptitude Test (DAT), Graduate Record Examination (GRE-Q), GRE-Mathematics, California Achievement Test, and the Iowa Test of Basic Skills (ITBS).1 Data from the National Assessment of Educational Progress (NAEP; Dossey, Mullis, Lindquist, & Chambers, 1988) were also included. To obtain data from additional large-scale assessments, a letter was sent to one official of each state department of education and of the departments of education of the District of Columbia and the Canadian provinces of Manitoba, Nova Scotia, Ontario, and Saskatchewan (based on the 1987-1988 membership list of the Association of State Supervi- sors of Mathematics), for a total of 55 letters. There were 29 responses, and nine states provided usable data: Alabama, Connecticut, Michigan, North Carolina, Oregon, Pennsylvania, South Carolina, Texas, and Wisconsin. It is possible to obtain several independent effect sizes from a single article if, for example, data from several age groups (in a cross-sectional design) or several ethnic groups are reported. These groups can essen- tially be regarded as separate samples (Hedges, 1987, personal commu- nication). The result was 100 usable sources, yielding 259 independent effect sizes. This represents the testing of 3,985,682 subjects (1,968,846 males and 2,016,836 females). When data from the SATs were excluded (for reasons discussed later), there were 254 effect sizes, representing the testing of 3,175,188 subjects(l,585,712 males and 1,589,476 females). Coding the Studies For each study, the following information was recorded: (a) all statis- tics on gender differences in mathematics performance measure(s), in- cluding means and standard deviations or t, F, and df\ (b) the number of female and male subjects; (c) the cognitive level of the measure (com- putation,2 concepts, problem solving, and general-mixed); (d) the mathematics content of the test (arithmetic, algebra, geometry, calcu- lus, and mixed-unreported);(e) the age(s) of the subjects (if the article reported no age but reported "undergraduates" or students in an intro- ductory college course, the age was set equal to 19; if a grade level was reported, 5 years was added to that level to yield the age: e.g., third graders were recorded as 8-year-olds); (f) the ethnicity of the sample (Black, Hispanic, Asian American, American Indian, White, Austra- lian, Canadian, or mixed-unreported); (g) the selectivity of the sample (general samples, such as national samples or classrooms; moderately selected samples, such as college students or college-bound students; highly selected samples, such as students at highly selective colleges; samples selected for extreme precocity, such as the Study of Mathemati- cally Precocious Youth; samples selected for poor performance, such as Headstart samples, low socioeconomic status samples, or remedial college samples; and adult nonstudent samples); and (h) the year of pub- lication. Interrater Reliability Interrater agreement was computed for ratings of ethnicity, sample selectivity, cognitive level of the test, and mathematics content of the test. The formula used was Scott's (1955) pi coefficient, as recom- mended by Zwick (1988). Pi was 1.00 for ethnicity, .90 for sample selectivity, .88 for cognitive level, and 1.00 for mathematics content. Thus, these categories were coded with high reliability. Statistical Analysis The effect size computed was d, defined as the mean for males minus the mean for females, divided by the mean within-sexes standard devia- tion. Thus, positive values of d represent superior male performance and negative values represent superior female performance. Depending on the statistics available for a given study, formulas provided by Hedges and Becker (1986) were used for the computation of d and the homoge- neity statistics. All effect sizes were computed independently by two researchers, Janet Shibley Hyde and an advanced graduate student. There were discrepancies in fewer than 4% of the d values; these were resolved. All values of d were corrected for bias in estimation of the population effect size, using the formula provided by Hedges (1981). The complete listing of all studies, with effect sizes, is provided in Ta- ble 1. Results Magnitude of Gender Differences in Mathematics Performance Averaged over 259 values, the weighted mean effect size was 0.20. When data from the SATs (Ramist & Arbeiter, 1986) were 1 Although we tried to sample broadly over the major standardized tests, the number of these tests is great and it was not feasible to report data for all. In some cases, the test publisher was not able to provide the needed data. In other cases, we did not wish to include too many tests by the same publisher with the same format, thereby weighting those tests too greatly. For example, we include the GMAT but not the Law School Admission Test (LSAT) or the Medical College Admission Test (MCAT). All are published by Educational Testing Service and are sim- ilar, in the quantitative portion, in content and format. Furthermore, all include selective samples, although it is difficult to assess the degree of selection for mathematics performance. Therefore, we included the GMAT but not the LSAT or MCAT. Because our major interest was in assessing the magnitude of gender differences in mathematics perfor- mance in the general population, inclusion of data from tests (e.g., the MCAT) based on very selective samples was counterproductive. 2 The definitions of the cognitive levels were as follows: Computation refers to a test that requires the use of only algorithmic procedures to find a single numerical answer. Conceptual refers to a test that involves analysis or comprehension of mathematical ideas. Problem solving re- fers to a test that involves extending knowledge or applying it to new situations. Mixed tests include a combination of items from these cate- gories. (text continues on page 146) 142 J. HYDE, E. FENNEMA, AND S. LAMON Table 1 Studies of Gender Differences in Mathematics Performance (in Alphabetical Order) N Study Advanced Placement Calculus, 1988 (personal communication, Carol Dwyer, January 20, 1989) Alabama Department of Education, 1986- 1987 Alabama Department of Education, 1986- 1987 Alabama Department of Education, 1986- 1987 Alabama Department of Education, 1986- 1987 Alabama Department of Education, 1986- 1987 Alabama Department of Education, 1986- 1987 American College Testing Program, 1970 (American College Testing Program, 1987) American College Testing Program, 1 987 Backman, 1972 Behrens & Verron, 1978 Bell & Ward, 1980 Benbow & Stanley, 1980 Benbow& Stanley, 1980 Benbow & Stanley, 1980 Benbow & Stanley, 1980 Benbow & Stanley, 1980 Benbow & Stanely, 1980 Benbow & Stanley, 1980 Benbow & Stanley, 1980 Benbow & Stanley, 1980 Benbow & Stanley, 1983 Boli, Allen, & Payne, 1985 Brandon et al., 1985 Brandon et al., 1985 Brandon et al., 1985 Brandon et al., 1985 Brandon etal., 1985 Brandon et al., 1985 California Achievement Test (Green, 1987) California Achievement Test (Green, 1987) California Achievement Test (Green, 1 987) California Achievement Test (Green, 1987) California Achievement Test (Green, 1987) California Achievement Test (Green, 1987) California Achievement Test (Green, 1987) California Achievement Test (Green, 1 987) California Achievement Test (Green, 1987) California Achievement Test (Green, 1 987) California Achievement Test (Green, 1987) California Achievement Test (Green, 1987) California Achievement Test (Green, 1 987) California Achievement Test (Green, 1987) California Achievement Test (Green, 1 987) California Achievement Test (Green, 1987) California Achievement Test (Green, 1 987) California Achievement Test (Green, 1987) California Achievement Test (Green, 1987) California Achievement Test (Green, 1987) California Achievement Test (Green, 1987) California Achievement Test (Green, 1987) California Achievement Test (Green, 1987) California Achievement Test (Green, 1987) California Achievement Test (Green, 1987) Mean age 18 6 7 9 10 13 15 18 18 18 12 12 12 14 12 14 12 14 12 13 13 12 18 9 11 13 9 11 13 6 5 7 8 9 10 11 12 13 14 15 5 6 7 8 9 10 1 ] 12 13 14 15 5 6 7 Male subjects 31,280 34,250 30,419 27,307 25,845 26,657 24,427 11,994 356,704 1,406 155 31 90 133 135 286 372 556 495 1,549 2,046 19,883 689 1,237 1,259 1,137 891 1,000 1,122 959 377 1,953 476 304 351 369 472 411 283 224 374 553 1,316 280 229 224 217 379 278 188 112 2,507 3,649 7.486 Female subjects 22,115 31,336 28,573 26,872 25,095 25,889 25,388 11,664 420,740 1,519 137 41 77 96 88 158 222 369 356 ,249 ,628 19,937 465 ,207 ,176 ,107 857 953 1,087 858 419 2,001 529 331 389 378 465 402 329 275 367 540 1,228 277 228 212 207 332 314 227 132 2,425 3,377 7,353 d> 0.20 -0.02 -0.03 -0.06 -0.07 -0.02 0.00 0.36 0.32 0.92 -0.12 -0.10 0.41 0.76 0.73 0.54 0.43 0.48 0.46 0.44 0.39 0.37 0.55 -0.10 0.02 -0.06 -0.07 -0.11 -0.15 -0.02 -0.13 -0.11 -0.32 -0.16 -0.28 -0.09 -0.38 -0.07 -0.14 -0.08 -0.18 0.09 0.04 -0.12 -0.23 -0.45 -0.07 -0.15 -0.10 -0.30 0.04 -0.09 -0.03 0.01 Ethnic Selectivity Cognitive group" ofsample' leveld 6 5 4 6 1 4 6 1 4 6 1 4 6 1 4 6 1 4 6 1 4 6 2 4 6 2 4 6 4 4 8 1 4 6 1 4 6 4 4 6 4 4 6 4 4 6 4 4 6 4 4 6 4 4 6 4 4 6 4 4 6 4 4 6 4 4 6 3 4 5 1 3 5 5 3 3 3 2 2 2 2 2 2 2 2 2 2 2 6 6 6 3 3 3 3 3 1 2 2 1 2 1 2 1 2 1 2 2 1 2 1 2 1 2 1 2 1 2 2 1 2 Mathematicscontent" 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 GENDER DIFFERENCES IN MATHEMATICS PERFORMANCE 143 Table 1 (continued) N Study California Achievement Test (Green, 1987) California Achievement Test (Green, 1 987) California Achievement Test (Green, 1 987) California Achievement Test (Green, 1 987) California Achievement Test (Green, 1 987) California Achievement Test (Green, 1 987) California Achievement Test (Green, 1 987) California Achievement Test (Green, 1987) Carrier, Post, & Heck, 1985 Connecticut Department of Education, 1987 Connecticut Department of Education, 1987 Connecticut Department of Education, 1987 Connor &Serbin, 1980 Connor & Serbin, 1980 Differential Aptitude Test (Bennett, Seashore, &Wesman, 1979) Differential Aptitude Test (Bennett et al., 1 979) Differential Aptitude Test (Bennett et al., 1979) Differential Aptitude Test (Bennett et al., 1979) Differential Aptitude Test (Bennett et al., 1 979) D' Augustine, 1966 D' Augustine, 1966 D' Augustine, 1966 Davis, 1973 Dees, 1982 deWolf, 1981 Dick &Balomenos, 1984 Edge &Friedberg, 1984 Edge &Friedberg, 1984 Engle&Lerch, 1971 Ethington & Wolfle, 1986 Ethington & Wolfle, 1984 Exezidis, 1982 Exezidis, 1982 Exezidis, 1982 Exezidis, 1982 Exezidis, 1982 Exezidis, 1982 Exezidis, 1982 Exezidis, 1982 Exezidis, 1982 Fendrich-Salowey, Buchanan, & Drew, 1982 Fennema & Sherman, 1978 Fennema & Sherman, 1978 Fennema & Sherman, 1978 Fennema & Sherman, 1977 Fennema & Sherman, 1977 Fennema & Sherman, 1977 Fennema & Sherman, 1977 Ferrini-Mundy, 1987 Flaugher, 1971 Flaugher, 1971 Flaugher, 1971 Flaugher, 1971 Flaugher, 1971 Flaugher, 1971 Flexer, 1984 Graduate Management Admission Council, 1987 Graduate Management Admission Council, 1987 Mean age 8 9 10 11 12 13 14 15 9 9 11 13 12 15 13 14 15 16 17 10 11 12 13 15 16 19 19 19 6 15 18 11 11 11 12 12 12 13 13 13 11 11 12 13 14 15 16 17 19 16 16 16 16 16 16 13 21 23 Male subjects 2,035 1,266 1,402 1,547 2,178 2,010 1,646 1,121 65 15,465 14,504 15,009 71 108 7,000 7,000 6,400 5,350 5,000 29 33 34 45 1,053 962 72 74 158 67 3,610 2,306 80 80 80 80 80 80 80 80 80 12 203 206 223 194 181 199 70 127 1,211 155 207 512 1,120 864 61 2,952 25,048 Female subjects ,925 ,175 ,279 ,429 ,967 ,947 ,748 ,170 79 15,462 14,722 14,919 63 97 6,900 7,350 6,750 5,800 5,350 31 27 26 45 962 1,131 62 51 207 63 4,226 2,807 80 80 80 80 80 80 80 80 80 12 203 225 260 219 169 167 34 122 1,923 151 200 562 1,614 950 63 2,392 17,687 d' -0.11 -0.06 -0.32 -0.08 -0.35 -0.08 -0.32 0.05 -0.43 0.02 -0.02 0.06 0.04 0.23 -0.11 -0.08 0.00 0.03 0.13 -0.59 0.15 -0.09 0.81 0.14 0.38 0.09 -0.15 -0.05 -0.35 0.21 0.27 0.06 -0.11 -0.18 -0.09 -0.15 -0.21 0.00 -0.66 -0.03 0.18 0.30 -0.05 -0.11 0.23 0.35 0.41 0.22 0.06 0.29 0.28 0.27 0.49 0.18 0.33 0.18 0.45 0.43 Ethnic Selectivity Cognitive groupb of sample" leveld 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 5 1 2 5 1 2 5 1 2 6 6 6 6 6 6 6 6 6i 5 3 2 1 5 1 2 1 2 1 2 1 2 1 4 4 4 4 4 4 4 4 3 4 3 4 I 1 4 4 3 3 3 3 3 3 2 2 2 1 3 2 1 4 4 4 4 4 4 4 4 4 4 4 6 3 1 6 3 4 6 3 4 Mathematics content1 5 5 5 5 5 5 5 5 1 5 5 5 3 2 I 1 1 1 1 3 3 3 3 3 5 2 2 2 1 5 5 5 5 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 2 5 5 5 5 5 5 5 5 5 (Table continues) 144 Table 1 (continued) J. HYDE, E. FENNEMA, AND S. LAMON N Study Graduate Management Admission Council, 1987 Graduate Management Admission Council, 1987 Graduate Management Admission Council, 1987 Graduate Management Admission Council, 1987 Graduate Management Admission Council, 1987 Graduate Management Admission Council, 1987 Graduate Management Admission Council, 1987 GRE-Mathematics, 1978 (personal communication, Eldon Park, January 9, 1989) GRE-Q (Educational Testing Service, 1 987) Hancock, 1975 Hanna, 1986 Harnisch & Ryan, 1983 Harris & Romberg, 1974 Hawnetal., 1981 Hawnetal., 1981 Henderson, Landesman, & Kachuck, 1985 Hilton &Berglund, 1974 Hilton & Berglund, 1974 Howe, 1982 Iowa Test of Basic Skills, 1 984 (Lewis & Hoover, 1987) Iowa Test of Basic Skills, 1 984 (Lewis & Hoover, 1 987) Iowa Test of Basic Skills, 1984 (Lewis & Hoover, 1987) Iowa Test of Basic Skills, 1 978 (Lewis & Hoover, 1987) Jacobs, 1973 Jacobs, 1973 Jarvis, 1964 Jerman, 1973 Johnson, 1984 Johnson, 1984 Johnson, 1984 Johnson, 1984 Johnson, 1984 Johnson, 1984 Kaczala, 1983 Kaczala, 1983 Kaczala, 1983 Kaczala, 1983 Kaczala, 1983 Kaplan & Flake, 1982 Kissane, 1986 Kissane, 1986 Kloosterman, 1985 Koffman & Lips, 1980 Lee&Coflrnan, 1974 Lee&Coflman, 1974 Leinhardt, Scewald, & Engel, 1979 Lewis & Hoover, 1983 Lloyd, 1983 Marjoribanks, 1987 Marsh, Smith, & Barnes, 1985 Mean age 25 27 29 33 37 45 55 27 27 14 13 17 10 7 8 15 16 16 13 7 10 13 8 12 17 11 10 19 19 19 19 19 19 10 11 12 13 14 19 13 16 15 30 13 10 7 11 10 11 10 Male subjects 25,855 19,246 19,233 14,088 8,967 4,445 954 1,813 92,722 65 1,773 4,791 195 324 324 45 632 249 40 4,623 5,088 5,085 4,497 40 40 366 107 97 99 58 42 46 49 50 36 52 46 48 18 52 50 63 35 76 93 372 223 497 472 422 Female subjects 15,681 10,078 8,704 6,633 4,570 2,419 397 734 104,922 54 1,750 4,791 196 272 301 36 688 290 40 4,712 5,152 5,148 4,875 40 40 347 133 97 104 67 44 42 58 46 43 53 52 45 76 46 20 61 35 74 61 354 234 466 456 137 d" 0.41 0.42 0.44 0.42 0.39 0.51 0.47 0.77 0.67 0.20 0.17 0.06 -0.25 -0.12 -0.13 -0.28 0.40 0.33 -0.01 0.00 0.00 0.00 -0.04 0.20 0.67 0.10 -0.06 0.36 0.66 0.37 0.81 0.56 0.88 -0.06 -0.47 -0.21 0.20 -0.27 0.74 0.66 0.49 0.24 0.10 0.09 0.13 -0.12 0.14 0.10 0.11 -0.30 Ethnic group" 6 6 6 6 6 6 6 6 6 6 8 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 6 6 6 6 6 6 6 7 7 Selectivity of sample" 3 3 3 3 3 3 3 9 3 1 | 1 1 I 1 I 3 0 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 I 1 1 2 3 3 1 NA I 1 1 1 1 1 I Cognitive leveld 4 4 4 4 4 4 4 3 4 4 4 4 2 4 4 4 3 3 4 1 2 3 3 4 4 3 1 3 3 3 3 3 3 4 4 4 4 4 4 3 3 2 4 4 4 4 4 4 4 4 Mathematics contente 5 5 5 5 5 5 5 5 5 5 3 5 3 5 5 1 5 5 5 5 5 5 5 5 5 1 1 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 3 5 5 (Table continues) GENDER DIFFERENCES IN MATHEMATICS PERFORMANCE 145 Table 1 (continued) N Study Marshall* Smith, 1987 Meyer, 1978 Michigan Department of Education, 1987 Michigan Department of Education, 1987 Michigan Department of Education, 1 987 Mills, 1981 Moore &Smith, 1987 Moore* Smith, 1987 Moore & Smith, 1987 Moore & Smith, 1987 Moore & Smith, 1987 Moore & Smith, 1987 Moore & Smith, 1987 Moore & Smith, 1987 Moore & Smith, 1987 Muscio, 1962 National Assessment of Educational Progress [NAEP], 1978 (Dossey, Mullis, Lindquist, & Chambers, 1988)NAEP, 1978 (Dossey etal., 1988) NAEP, 1978 (Dossey etal., 1988) NAEP, 1986 (Dossey etal., 1988) NAEP, 1986 (Dossey etal., 1988) NAEP, 1986 (Dossey etal., 1988) Newman, 1984 North Carolina Department of Public Instruction, 1987 North Carolina Department of Public Instruction, 1987 North Carolina Department of Public Instruction, 1987 Oregon Department of Education, 1987 Parsley, Powell, O'Connor, & Deutsch, 1 963 Parsley etal., 1963 Parslev etal., 1963 Parsley etal., 1963 Parsley et al., 1963 Parsley etal., 1963 Parsley etal., 1963 Pattison & Grieve, 1984 Pattison & Grieve, 1984 Pattison & Grieve, 1984 Pederson, Shinedling, & Johnson, 1968 Pennsylvania Department of Education, 1987 Pennsylvania Department of Education, 1987 Pennsylvania Department of Education, 1987 Plake, Ansorge, Parker, & Lowry, 1982 Powell & Steelman, 1983 Randhawa & Hunt, 1987 Randhawa & Hunt, 1987 Randhawa & Hunt, 1987 Rosenberg & Sutton-Smith, 1969 Saltzen, 1982 Saltzen, 1982 Saltzen, 1982 Saltzen, 1982 SAT Mathematics Level 1 (Ramist&Arbeiter, 1986) SAT Mathematics Level 2 (Ramist & Arbeiter, 1986) Schonberger, 1981 Schratz, 1978 Schratz, 1978 Mean age 11 9 9 12 15 13 19 19 19 19 19 19 19 19 19 11 9 13 17 9 13 17 7 8 11 13 13 7 8 9 10 11 12 13 15 18 18 8 8 10 13 19 21 9 12 15 20 7 10 7 10 18 18 19 9 9 Male subjects 3,750 97 2,486 2,391 2,435 42 316 668 212 314 971 118 95 454 57 206 3,688 6,052 6,689 1,733 1,550 967 82 41,053 41,279 42,817 1,027 379 379 379 379 379 379 379 192 31 91 12 52,228 49,851 55,384 26 30 675 790 859 355 92 104 76 122 71,881 28,890 34 20 20 Female subjects 3,650 82 2,479 2,563 2,520 73 247 553 207 365 1078 137 161 532 62 207 3,688 6,052 6,689 1,733 1,550 967 61 38,439 38,855 40,938 1,028 338 338 383 383 383 338 338 156 11 95 12 52,150 50,184 54,309 31 21 654 706 900 658 75 80 77 144 76,373 17,000 23 20 20 d' -0.12 -0.14 -0.15 -0.09 -0.15 0.18 -0.04 0.11 0.08 0.31 0.41 0.45 0.28 0.48 0.76 0.21 -0.08 -0.03 0.22 0.00 0.07 0.18 -0.20 -0.12 -0.24 -0.26 0.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.29 0.05 0.33 -0.70 -0.02 -0.06 0.00 0.53 0.83 0.16 0.06 -0.06 0.16 -0.13 -0.39 0.08 0.15 0.40 0.38 0.48 -0.34 0.03 Ethnic group" 6 6 6 6 6 6 1 5 2 1 5 2 1 5 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 6 6 6 6 6 6 8 8 8 6 6 6 6 6 6 6 6 2 [ Selectivity of sample' 1 1 1 1 1 1 0 0 0 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 1 1 1 2 1 1 1 1 2 2 0 0 0 Cognitive Mathematics level11 content5 1 3 1 1 1 4 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 1 4 4 4 4 4 4 4 4 4 4 4 3 3 3 1 4 4 4 4 4 2 3 1 4 1 1 2 2 4 4 2 2 2 (Tab! 1 5 5 5 5 5 5 5 5 5 5 5 5 1 5 5 5 5 1 1 1 1 1 1 1 3 3 3 1 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 2 5 5 » continues) 146 Table 1 (continued) J. HYDE, E. FENNEMA, AND S. LAMON N Study Schratz, 1978 Schratz, 1978 Schratz, 1978 Schratz, 1978 Scnk & Usiskin, 1983 Senk, 1982 Senk, 1982 Senk, 1982 Sheehan, 1968 South Carolina Department of Education, 1987 South Carolina Deparlment of Education, 1987 South Carolina Department of Education, 1987 South Carolina Department of Education, 1987 South Carolina Department of Education, 1987 Steel, 1978 Swafford, 1980 Texas Education Agency, 1987 Todd, 1985 Usiskin, 1972 Usiskin, 1972 Verbeke, 1982 Verbeke, 1982 Verbeke, 1982 Webb, 1984 Weiner, 1983 Whigham, 1985 Whigham, 1985 Whigham, 1985 Whigham, 1985 Wisconsin Department of Public Instruction, 1984 Wisconsin Department of Public Instruction, 1984 Wisconsin Department of Public Instruction, 1984 Wozencraft, 1963 Wozencraft, 1963 Wrabel, 1985 Yawkey, 1981 Zahn, 1966 Mean age 9 14 14 14 16 16 16 16 14 9 10 12 14 16 18 14 NA 9 15 15 13 15 16 13 13 20 20 20 20 9 13 17 8 11 15 5 13 Male subjects 20 20 20 20 674 266 268 245 57 22,531 21,622 23,390 25,559 18,778 546 294 95,168 63 87 74 17 14 10 44 43 63 123 88 20 871 783 691 282 301 99 48 14 Female subjects 20 20 20 20 690 240 240 261 50 22,313 21,076 22,513 24,370 19,627 621 329 97,366 60 67 75 23 12 14 33 27 54 115 89 26 867 761 750 282 302 103 48 13 d' 0.08 -0.89 -0.05 0.45 0.05 0.04 0.12 -0.03 -0.04 -0.26 -0.01 -0.29 -0.04 -0.02 0.01 -0.09 0.03 -0.21 0.33 0.30 -0.49 0.46 0.00 0.15 0.32 -0.19 -0.05 -0.09 0.08 -0.10 0.06 -0.17 -0.23 -0.15 -0.06 -0.42 0.86 Ethnic groupb 5 2 1 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 Selectivity of sample* 0 0 0 0 1 1 1 1 1 1 1 1 1 1 4 4 4 2 4 2 2 2 2 1 1 1 1 1 1 1 Cognitive level" 2 2 2 2 4 3 3 3 3 1 2 1 2 1 4 1 4 4 2 2 4 4 4 4 4 4 4 4 4 1 2 1 3 1 4 4 2 Mathematics content* 5 5 5 5 3 3 3 3 2 5 5 5 5 5 5 1 5 5 3 3 5 5 5 5 5 2 2 3 4 5 5 5 1 1 5 1 1 a Positive values reflect better performance by males; negative values reflect better performance by females. " 1 = Black, 2 = Hispanic, 3 = Asian American, 5 = White, 6 = mixed or unreported, 7 = Australian, 8 = Canadian, 9 = American Indian. e 0 = Selected for low performance, 1 = general samples, 2 = moderately selected, 3 = highly selected, 4 = highly precocious samples. d I = computation, 2 - understanding of concepts, 3 = problem solving, 4 = mixed or unreported. e 1 — arithmetic, 2 = algebra, 3 = geometry, 4 = calculus, 5 - mixed or unreported. excluded, the remaining 254 effect sizes yielded a weighted mean d of 0.15. In both cases, this small positive value indicates that, overall, males outperformed females by a small amount. When one looks just at samples of the general population, rfwas —0.05, reflecting a superiority in female performance, but of negligible magnitude. We excluded the SAT data from the remainder of the meta- analysis for the following reason. The number of subjects in this group was so enormous (810,494) that they accounted for 20% of all subjects and, in a weighted means analysis, they exerted a disproportionate effect. We reserve a separate section of the discussion for the SAT data. Overall, 131 (51%) of the 259 effect sizes were positive, re- flecting superior male performance; 17 (6%) were exactly zero; GENDER DIFFERENCES IN MATHEMATICS PERFORMANCE 147 Table 2 Magnitude of Gender Differences as a Function of the Cognitive Level of the Test Cognitive level Computation Concepts Problem solving Mixed or unreported k 45 41 48 120 d -0.14 -0.03 0.08 0.19 95% confidence interval for d -0.14 to -0.13 -0.04 to -0.02 0.07 to 0.10 0.18to0.19 H 1,144* 118* 703* 39,557* Note, k represents the number of effect sizes; H is the within-groups homogeneity statistic (Hedges & Becker, 1986). * Significant nonhomogeneity at p < .05, according to chi-square test. All other categories are homogeneous. and 111 (43%) were negative, reflecting superior female perfor- mance. Homogeneity analyses using procedures specified by Hedges and Becker (1986)indicated that the set of 254 effect sizes was significantly nonhomogeneous, H = 49,001.09, compared with a critical value of x2(253) = 300 (approximation), p < .0001. Therefore, we concluded that the set of effect sizes is heteroge- neous and we sought to partition the set of studies into more homogeneous subgroups, using factors that we hypothesized would predict effect size. These factors are ones that have pre- viously been shown to be important moderators of gender differences in mathematics performance (e.g., Fennema, 1974; Stage et al., 1985). Subsequently, we performed regression anal- yses to determine which variables are the best predictors of variations in d. Cognitive Level The results of the analysis of effect sizes, arranged according to the cognitive level of the test, are shown in Table 2. As in the overall analysis, the effect sizes are small. There is a slight fe- male superiority in computation, no gender difference in under- standing of concepts, and a slight male superiority in problem solving. Oddly, the gender difference for tests with a mixture of cognitive levels (or no report of cognitive level) is largest, al- though still less than 0.25 standard deviation. Homogeneity analyses indicate that there are significant differences between the four effect sizes shown in Table 2; the between-groups homogeneity statistic (Hs) was "7,479 com- pared with a critical x2(3) = 7.81. However, it should be noted that the number of subjects and the number of effect sizes in this analysis is so great that small differences can be significant. In the succeeding analyses, HBs can be compared to see which between-groups effects are strongest. The cognitive-level effect is a large one compared with the others. Mathematics Content of the Tests The analysis according to the mathematics content of the tests was less successful because so many studies failed to report the mathematics content or used tests with a mixture of con- tent. The results of the analysis are shown in Table 3. They indi- cate that there was no gender difference in arithmetic or algebra performance. The male superiority in geometry was small (0.13), and the tests with mixed content showed the largest gen- der difference. Homogeneity analyses indicated that there was a significant difference between the effect sizes for the different types of math content, HE = 548 compared against a critical x2(4) = 9.49. This between-groups difference was smaller than most of the others. Age Differences The ages were divided into five subgroups: (a) 5- to 10-year- olds, (b) 11- to 14-year-olds, (c) 15- to 18-year-olds, (d) 19- to 25-year-olds, and (e) those 26 and older. These age groupings were chosen for two reasons. First, they correspond roughly to elementary school, middle or junior high school, high school, college, and adulthood. Second, some reviewers have asserted that there is no gender difference in mathematics performance until the age of 12, when it begins to emerge (e.g., Maccoby & Jacklin, 1974). Other reviewers believe that the difference does not emerge until the last 2 or 3 years of high school (e.g., Meece et al., 1982; Stage et al., 1985). Thus, it was important to have age categories reflecting these two hypotheses. The results of the analysis for age categories are shown in Table 4. Overall, there was a small female superiority in the elementary and middle school years. There was a more substan- tial male superiority in the high school years, the college years, and beyond, although this last finding is based on relatively few effect sizes, most of them from the ORE. Homogeneity analyses indicate that there are significant differences in the magnitude of the gender difference as a func- tion of age group, HB = 37,669 compared with a critical x2(4) = 9.49. The age effect is strong. The results of the analysis of Age X Cognitive Level of the Test interaction are also shown in Table 4. Females were supe- rior in computation in elementary school and middle school, although all differences were small. There was essentially no gender difference at any age level in understanding of mathe- matical concepts. Problem solving, on the other hand, presents Table 3 Magnitude of Gender Differences as a Function of the Mathematics Content of the Test Mathematics content Arithmetic Algebra Geometry Calculus Mixed or unreported k 35 9 19 2 190 d 0.00 0.02 0.13 0.20 0.15 95% confidence interval for d -0.02 to 0.01 -0.08 to 0.11 0.09 to 0.16 0.1 8 to 0.22 0.15to0.15 II 368* 8 47* 0.17 48,064* Note, k represents the number of effect sizes; H is the within-groups homogeneity statistic (Hedges & Becker, 1986). * Significant nonhomogeneity at p < .05, according to chi-square test. All other categories are homogeneous. 148 J. HYDE, E. FENNEMA, AND S. LAMON Table A Magnitude of Gender Differences as a. Function of Age and Cognitive Level of the Test Table 6 Magnitude of the Gender Difference as a Function of the Selectivity of the Sample Cognitive level Age group 5-10 11-14 15-18 19-25 26 and older All studies -0.06 (6?) -0.07 (93) 0.29" (53) 0.41 m) 0.59 (9) Computation -0.20 (30) -0.22 (38) 0.00 (12) NA NA Concepts -0.02 (33) -0.06 (28) 0.07 (9) NA NA Problem solving 0.00 ( I D -0.02 (21) 0.29 (10) 0.32 (15) NA Sample Jt General 1 84 Moderately selective 24 Highly selective 18 Precocious 15 Selected for low performance 12 d -0.05 0.33 0.54 0.41 0.11 95% confidence interval for d -0.06 to -0.05 0.331o0.34 0.53100.54 0.391o0.43 0.041oO.IS H 5,461' 290* 1,674* 211* 24* Note, k represents the number of effect sizes; H is the within-groups homogeneity statistic (Hedges & Becker, 1986). * Significant nonnomogeneity at jj < .05. according to chi-square test. Note. NA = not available: there were two or fewer effect sizes, so a mean could not be computed, fe is show-n in parentheses* where k = number of effect sizes on which the computation of the mean was based. a Data for the Scholastic Aptitude Test were excluded in the computa- tion of this effect size. a different picture. There was a slight female superiority or no gender difference in the elementary and middle school groups; however, a moderate gender difference favoring males was found in the high school and college groups. Ethnicity The results for the analysis of gender differences as a function of ethnicity are shown in Table 5. Data forthe SAT are provided by ethnic group and were coded in that manner for the present meta-analysis. Two effect sizes are provided: d, is the mean of all effect sizes including the SAT, and d2 is the mean of effect sizes excluding the SAT. When the SAT data were excluded, there was essentially no gender difference in mathematics performance for Blacks, His- panics, and Asian Americans. Indeed, the 95<5; confidence inter- Table 5 Magnitude of Gender Differences as a Function of Ethnicity Ethnic ^ roup Black Hispanic Asian American White Australian Canadian American Indian Mixed or unreported d, 0.23 (22) 0.30(21) 0.29 (5) 0.41 (14) 0.11(7) 0.09 (5) 0.44(1) 0.15(184) * -0.02(21) 0.00 (20) -0.09 (4) 0.13(13) 0.11(7) 0.09 (5) NA 0.15(184) H 219' 157* 15' 152* 31* 21* 48,114* Note. NA = Not available; no effect size was available in this caregorv tli - the mean for all effect sizes, rf, = the mean effect size excluding Scholastic Aptitude Test (SAT) data, H= homogeneity statistic based on data excluding the SAT. All samples are from the United States sinless otherwise indicated, k. the number of effect sizes ofl which each mean is based, is shown in parentheses. * Significant nonhomogeneity atp < .05 according to cni-squaretest. val for d covers 0 for both Blacks and Hispanics. The slight difference for Asian Americans favored females. Only for White Americans was there evidence of superior male performance, and the difference was still small. The mean effect size for Amer- ican Indians should not be taten too seriously because it is based on a single value. Homogeneity analyses, using the data set excluding the SAT, indicated that there were significant differences between ethnic groups in the magnitude of the gender difference, HK - 293 compared with a critical x2(6) = 12.59. Ethnicity was not one of the stranger effects. Selectivity of the Sample The analysis for the magnitude of the gender difference as a function of the selectivity of the sample is shown in Table 6. Notice that the gender difference was close to zero (favoring fe- males slightly) for general samples; a larger gender difference favoring males was found for each successive level of selection for higher ability. The gender difference was moderate to large for highly selected samples (d = 0.54) and for samples selected for extreme precocity (d = 0.41). Also note that the great major- ity of samples (184) in this meta-analysis were general and unse- lected. Not surprisingly, thegreatest heterogeneity ofeffect sizes was for the general samples. Homogeneity analyses indicated that there were significant differences in effect size depending on how selective the sample was, HE = 41,341 compared withacritical x'(4) = 9.49. Sample selectivity was one of the large effects. When the interaction of sample selectivity and cognitive level was examined, it was apparent that the effects of sample selec- tivity were found most strongly for problem solving. For such measures, the magnitude of Ihe gender difference varied from 0.02 for general samples to 0.43 for highly selected samples. Year of Publication Studies were divided into two subgroups depending on the year of publication: those published in 1973 or earlier and those published after 1973. We chose 1973 as a divider between older studies and more recent ones because it marked the last year that was included in the Maccoby and Jackhn (1974) and Fen- nema (1974) reviews. GENDER DIFFERENCES IN MATHEMATICS PERFORMANCE 149 For studies published in 1973 and earlier, d was 0.31, based on 37 effect sizes. For studies published in 1974 or later, d was 0.14, based on 217 effect sizes. Thus, the data show both the increase in research on gender and mathematics and a substan- tial trend for smaller gender differences in more recent studies. Regression Analysis In view of the fact that the first homogeneity analysis indi- cated that, overall, the set of effect sizes was nonhomogeneous, multiple regression analysis was used to construct a model of the sources of variation in effect sizes (Hedges & Becker, 1986). The effect size was the criterion variable. On the basis of the results of the categorical analyses reported previously, we per- formed an initial regression analysis using the following predic- tors: age of subjects, year of publication, ethnicity of sample, selectivity of sample, cognitive level of the test, mathematics content of the test, and the Age X Cognitive Level interaction. The regression analyses were conducted by using the GLM pro- cedure in the SAS statistics program. Repeated regression anal- yses indicated that the SAT data were having a disproportionate effect on the results, particularly in terms of the strength of the ethnicity variable, because of the large sample size. Thus, the SAT data were deleted in the final multiple regression analysis. In addition, those few studies in which the sample had been selected for poor performance were also deleted, because they did not fit conceptually with the ratings of samples for increas- ingly greater selectivity for high performance. For the final re- gression analysis, predictors that were nonsignificant in previ- ous analyses were deleted. The result was a simple, well-defined equation in which 87% of the variance in d was predicted by three variables: subjects' age, selectivity of the sample, and cognitive level of the test. All three were significant predictors; Age was the strongest predic- tor, F(l, 232) = 1,171.04, p < .0001, followed by sample selec- tivity, F(3, 232) = 113.22, p < .0001, which was followed by cognitive level, F\3, 232) = 7.88, p < .0001. (Sample selectivity and cognitive level were coded as class variables.) Discussion Averaged over all studies, the mean magnitude of the gender difference in mathematics performance was 0.20. When SAT data were excluded, d was 0.15. The positive value indicates better performance by males on the average, but the magnitude of the effect size is small. Figure 1 shows two normal distribu- tions that are 0.15 standard deviation apart. If one looks only at samples of the general population (excluding selective sam- ples), d was —0.05, indicating a female superiority in perfor- mance, but one of negligible magnitude. We can place consider- able confidence in these results because they are based on test- ing literally millions of subjects, on more than 200 effect sizes, and on many well-sampled, large studies such as the state assess- ments. These findings are in contrast to the results of Hyde's (1981) earlier meta-analysis, in which she reported a d of 0.43 for quantitative ability. The discrepancy may be accounted for in two ways. First, her computation was based on a small sample of studies taken from the Maccoby and Jacklin (1974) review; Z SCORE - 4 - 3 - 2 - 1 0 1 2 3 4 Figure 1. Two normal distributions that are 0.15 standard deviations apart (i.e., d = 0.15. This is the approximate magnitude of the gender difference in mathematics performance, averaging over all samples.) sufficient information was available for the computation of only seven values of d. In addition, to test Maccoby and Jacklin's hypothesis that gender differences in mathematics performance emerge around the age of 12 or 13, only studies with subjects 12 years old or older were included. Using only that set of studies probably produced a larger gender difference than if studies with younger subjects had also been included. Second, the pres- ent meta-analysis provides evidence that the magnitude of gen- der differences has declined over the past three decades. We found that d was 0.31 for studies published in 1973 or earlier and 0.14 for studies published in 1974 or later. Thus, there prob- ably has been a decline in the gender difference since 1973. These findings are consistent with those of Feingold (1988), who documented a decline in the magnitude of gender differences in abilities as measured by several standardized tests. It is important to recognize that the set of effect sizes is not homogeneous. It is therefore essential to consider variations in the magnitude of the gender difference as a function of the three variables that were significant predictors in the multiple regres- sion analyses: age, selectivity of the sample, and cognitive level of the test. Age Trends and Cognitive Level Age trends in the magnitude of the gender difference in math- ematics performance are important. Averaging over all studies, there was a slight female superiority in performance in the ele- mentary and middle school years. A moderate male superiority emerged in the high school years (d = 0.29) and continued in the college years (d - 0.41), as well as in adulthood (d = 0.59). However, the age trends were a function of the cognitive level tapped by the test. Females were superior in computation in elementary and middle school, and the difference was essen- tially zero in the high school years. The gender difference was essentially zero for understanding of mathematical concepts at all ages for which data were available. It was in problem solving that dramaticage trends emerged. The gender difference in problem solving favored females slightly (effect size essentially zero) in the elementary and middle school years, but in the high school and college years there was a moderate effect size favor- ing males. These are precisely the years when students are per- mitted to select their own courses, and females elect somewhat 150 J. HYDE, E. FENNEMA, AND S. LAMON fewer mathematics courses than do males (Meece et al., 1982). Differences in course selection appear to account for some but not all of the gender difference in performance on standardized tests in the high school and college years (Kimball, 1989). We are puzzled by the fact that tests with mixed or unre- ported cognitive levels had a slightly larger gender difference (0.19) than tests of problem solving (0.08). One possible expla- nation is that there may be some feature of the format or admin- istration of these tests, about which we lacked information, that produced a male advantage on the tests. For example, the con- tent of problem-solving items on those tests may have heavy representation of masculine-stereotyped content, which has been shown to produce better performance by males in some studies, although results on the issue are mixed (e.g., Donlon, 1973;Selkow, 1984). Sample Selectivity Sample selectivity was one of the three most powerful predic- tors of effect size in the multiple regression analysis. When all effect sizes (excluding the SAT) were averaged, d was 0.15. Yet when only those 184 effect sizes based on general, unselected populations were averaged, d was —0.05. That is, there was a shift to a slight female advantage, although the difference was essentially zero. The magnitude of the gender difference favor- ing males grew larger as the sample was more highly selected: d was 0.33 for moderately selected samples (such as college stu- dents), 0.54 for highly selected samples (such as students at highly selective colleges, or graduate students), and 0.41 for samples selected for exceptional mathematical precocity. These findings are very helpful in interpreting the results of Benbow and Stanley's (1980, 1983) study of mathematically precocious youth. Their research has found large gender differ- ences favoring males in mathematics performance, and the re- sults have been widely publicized. Often the secondary reports fail to acknowledge the specialized sampling in the study, im- plying that the large gender differences are true of the general population. The results of the present meta-analysis demon- strate empirically exactly what would be expected from a con- sideration of normal distributions (Hyde, 1981): Large gender differences can be found at the extreme tails of distributions even though the gender difference for the entire population is small. Certainly it is important to study gifted populations, but it is essential to remember that results from studies like Benbow and Stanley's do not generalize to the rest of the population. We must raise one caveat about studies that were coded as unselected samples of the general population. In high school, males have a higher dropout rate than females (Ekstrom, Goertz, Pollack, & Rock, 1986). Dropouts tend to be low scor- ers, and they are not included in data based on the testing of high school students. Thus, male advantages in performance in high school and later may in part result from the selective loss of low-scoring males from the samples. The SAT-Math A recent meta-analysis of gender differences in verbal ability (Hyde & Linn, 1988) indicated that the SAT-Verbal produced idiosyncratic results. The average of all effect sizes yielded a d of 0.11, indicating a slight female superiority in performance, although the authors concluded that the gender difference had essentially become zero. Yet the SAT-Verbal produced a d of -.11 (the negative sign reflecting superior male performance in that meta-analysis). That is, the SAT yielded superior male performance when the pattern over all other tests was a slight female superiority in performance. The SAT-Math also yielded discrepant results in the present analysis. The overall effect size, excluding the SAT, was 0.15. Yet, according to the data from the 1985 administration of the SAT (Ramist & Arbeiter, 1986), for males the mean was 499 (SD = 121), and for females the mean was 452 (SD = 112), resulting in a do!.40. That is, the SAT produced a considerably larger gender difference than our overall meta-analysis found. The larger gender difference favoring males on the SAT may be due to several factors: 1. The SAT data are based on a moderately selected sample, those who are college-hound. As we indicated earlier, sample selectivity increases the magnitude of the gender difference. For moderately selected samples excluding the SAT, rfwas 0.33. 2. As Hyde and Linn (1988) pointed out, a larger number of females take the SAT, and the males appear to be a somewhat more advantaged sample in terms of parental income, father's education, and attendance at private schools (Ramist & Ar- beiter, 1986). In short, the male SAT sample may be more highly selected than the female sample. 3. There may be features of the content of the test itself or of its administration that enlarge the difference between males and females. For example, the present meta-analysis indicates that gender differences are larger in the high school years for mea- sures of problem solving but not for computation. Although the SAT includes many items that tap problem solving, there also are some purely computational items.3 The SAT was coded as "mixed" in our cognitive-level analysis. The mixture of prob- lem solving and computational items should produce a gender difference favoring males, but it should be smaller than 0.40. How Large Are the Gender Differences in Mathematics Performance? The interpretation of the magnitude of effect sizes has been debated. Cohen (1969) considered a d of 0.20 small, a dof 0.50 medium, and a d of 0.80 large. On the other hand, Rosenthal and Rubin (1982b) have introduced the binomial effect size dis- play as a means of translating effect sizes into practical signifi- cance. For example, an effect size reported for success in curing cancer, reported as a correlation of .20, translates into increas- ing the cure rate from 40% to 60%, surely an important practi- cal effect. Our overall value for samples of the general popula- tion, a d of -0.05, translates into a correlation of-.025, which yields only a 3% increase in success rate (from 48.5% to 51.5%). Applied to the analysis of gender differences, it means that ap- proximately 51.5% of females score above the mean for the gen- 3 An example of a computational item from the SAT is the following: The test taker is asked to tell which of the following quantities is greater or whether the two are equal: ('/3 - 'A) and 2/i 5 (College Entrance Exami- nation Board, 1986). GENDER DIFFERENCES IN MATHEMATICS PERFORMANCE 151 eral population, whereas 48.5% of males score above the mean. Thus, the overall effect size is so small that even the binomial effect size display indicates little practical significance. The effect size of 0.29 for problem solving in high school- aged students translates into 43% of females and 57% of males falling above the mean of the overall distribution, using the bi- nomial effect size display. Some idea of the magnitude of the overall effect size of—0.05 for general populations or the effect size of 0.29 for problem solving in high school students can also be gained by comparing them with effect sizes found in other meta-analyses. For exam- ple, a meta-analysis of gender differences in verbal ability found d to be 0.11, and the authors concluded that the value was so small as to indicate no difference (Hyde & Linn, 1988). A meta- analysis of genderdifferences in spatial ability indicated that the magnitude of the gender difference depended considerably on the type of spatial ability tested (Linn & Petersen, 1985). For measures of spatial perception (e.g., the rod-and-frame test), d was 0.44. For measures of spatial visualization (e.g., Hidden Figures Test), d was 0.13. For measures of mental rotation (e.g., PMA Space or the Vandenberg), d was 0.73. In all cases the differences favored males. Linn and Petersen concluded that the only substantial gender difference was in measures of mental rotation. Meta-analyses in the realm of social behavior have indicated that d was .50 for gender differences in aggression, including studies with subjects of all ages (Hyde, 1984). For social-psycho- logical studies of aggression by adult subjects, rfwas .40 (Eagly & Steffen, 1986). For gender differences in helping behavior, d was .13, although the effect sizes were extremely heterogeneous and d varied, for example, from -0.18 for studies conducted in the laboratory to 0.50 for studies conducted off campus (Eagly &Crowley, 1986). One can also compare the magnitude of the gender difference with effects that have been obtained outside the realm of gender differences. For example, the average effect of psychotherapy, comparing treated with control groups, is .68 (Smith & Glass, 1977). Thus, the overall effect size of 0.15 (or -0.05 for samples of only the general population) for gender differences in mathe- matics performance can surely be called small. The largest effect sizes we obtained were 0.29 and 0.32 for problem solving in the high school and college years, respectively. These are moderate differences that are comparable, for example, to the gender difference in aggressive behavior, yet they are smaller than the effects of psychotherapy. Implications This meta-analysis provided little support for the global con- clusions that "boys excel in mathematical ability" (Maccoby & Jacklin, 1974, p. 352) or "the finding that males outperform females in tests of quantitative or mathematical ability is ro- bust" (Halpern, 1986, p. 57). The overall gender difference is small at most (d = 0.15 for all samples or —0.05 for general samples). Furthermore, a general statement about gender differences is misleading because it masks the complexity of the pattern. For example, females are superior in computation, there are no gender differences in understanding of mathemati- cal concepts, and gender differences favoring males in problem solving do not emerge until the high school years. However, where gender differences do exist, they are in criti- cal areas. It is important for us to know that females begin in high school to perform less well than males on mathematical problem-solving tasks. Problem solving is critical for success in many mathematics-related fields, such as engineering and phys- ics. In this sense, mathematics skills may continue to be a criti- cal filter. The curriculum in mathematics, beginning well before high school, should emphasize problem solving for all students (National Council of Teachers of Mathematics, 1988). Cur- rently, it emphasizes computation, and girls seem to learn that very well. The schools must take more responsibility in the teaching of problem solving, both because it is an important area of mathematics and because it is an issue of gender equity. Boys may have more access to problem-solving experiences outside the mathematics classroom than do girls, creating boys' pattern of better performance (Kimball, 1989). For example, data from California high schools from 1983 to 1987 indicate that girls made up only about 38% of physics students, 34% of advanced physics students, and 42% of chemistry students (Linn & Hyde, in press). These science courses are likely to pro- vide extensive experience with problem solving, and fewer girls than boys gain that experience. The gender difference that was found on the SAT-Math also has significant implications. Scores on the SAT are used as cri- teria for college admission and for selection of scholarship re- cipients. Thus, lower SAT-Math scores may influence these crit- ical decisions about female students. The format and items of the SAT-Math should continue to be inspected for two purposes: (a) to determine whether some items are gender- biased and should be eliminated from the test, and (b) to deter- mine whether certain items tap important problem-solving skills that are not taught adequately in the mathematics curric- ulum of the schools. Then schools will be able to take positive steps to improve the teaching of the mathematics required to solve such problems. One frustration that occurred in the process of conducting this meta-analysis was the difficulty of analyzing the results ac- cording to the mathematics content of the test. Few authors specified the content clearly, probably because the content was mixed. We must know if there are large gender gaps for certain types of content. That can be determined only when researchers construct tests and report results that assess the various kinds of mathematics content separately. Nonetheless, the gender differences in mathematics perfor- mance, even among college students or college-bound students, are at most moderate. Thus, in explaining the lesser presence of women in college-level mathematics courses and in mathe- matics-related occupations, we must look to other factors, such as internalized belief systems about mathematics, external fac- tors such as sex discrimination in education and in employ- ment (Kimball, 1989), and the mathematics curriculum at the precollege level. References Anastasi, A. (1958). Differential psychology(3rded.). New York: Mac- millan. 152 J. HYDE, E. FENNEMA, AND S. LAMON Benbow, C. P., & Stanley, J. C. (1980). Sex differences in mathematical ability: Fact or artifact? Science, 210, 1262-1264. Benbow, C. P., & Stanley, J. C. (1983). Sex differences in mathematical reasoning ability: More facts. Science, 222, 1029-1031. Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York: Academic Press. College Entrance Examination Board. (1986). 10 SATs. New York: Au- thor. Donlon, T. F. (1973). Content factors in sex differences on test questions (ETS RB 73-28). Princeton, NJ: Educational Testing Service. Dossey, J. A., Mullis, I. V. S., Lindquist, M. M., & Chambers, D. L. (1988). The mathematics report card: Are we measuring up? (1986 National Assessment of Educational Progress Report No. 17-M-01). Princeton, NJ: Educational Testing Service. Eagly, A. H., & Crowley, M. (1986). Gender and helping behavior: A meta-analytic review of the social psychological literature. Psycholog- ical Bulletin, 100, 283-308. Eagly, A. H., & Steffen, V. J. (1986). Gender and aggressive behavior: A meta-analytic review of the social psychological literature. Psycholog- ical Bulletin. 100. 309-330. Eccles, J. S. (1987). Gender roles and women's achievement-related de- cisions. Psychology of Women Quarterly, 11, 135-172. Ekstrom, R., Goertz, M. E., Pollack, 1. M., & Rock, D. A. (1986). Who drops out of high school and why? Findings from a national study. Teachers College Record, 87, 356-373. Feingold, A. (1988). Cognitive gender differences are disappearing. American Psychologist, 43, 95-103. Fennema, E. (1974). Mathematics learning and the sexes. Journal for Research in Mathematics Education, 5, 126-129. Fennema, E., & Carpenter, T. P. (1981). Sex-related differences in math- ematics: Results from the National Assessment. Mathematics Teacher. 74, 554-559. Fennema, E., & Peterson, P. (1985). Autonomous learning behavior: A possible explanation of gender-related differences in mathematics. In L. S. Wilkinson & C. B. Marrett (Eds.), Gender influences in class- room interaction (pp. 17-36). New York: Academic Press. Halpern,D. F. (1986). Sex differences in cognitive abilities. Hillsdale, NJ: Erlbaum. Hedges, L. V. (1981). Distribution theory for Glass's estimator of effect size and related estimators. Journal of Educational Statistics, 7, 119- 137. Hedges, L. V., & Becker, B. J. (1986). Statistical methods in the meta- analysis of research on gender differences. In J. S. Hyde & M. C. Linn (Eds.), The psychology of gender: Advances through meta-analysis (pp. 14-50). Baltimore: Johns Hopkins University Press. Hedges, L. V., &Olkin, I. (1985). Statistical methods for meta-analysis. New York: Academic Press. Hunter, J. E., Schmidt, F. L., & Jackson, G. B. (1982). Meta-analysis: Cumulating research findings across studies. Beverly Hills, CA: Sage. Hyde, J. S. (1981). How large are cognitive gender differences? A meta- analysis using u2 and d. American Psychologist, 36, 892-901. Hyde, J. S. (1984). How large are gender differences in aggression? A developmental meta-analysis. Developmental Psychology, 20, 722- 736. Hyde, J. S., & Linn, M. C. (Eds.). (1986). The psychology of gender: Advances through meta-analysis. Baltimore: Johns Hopkins Univer- sity Press. Hyde, J. S., & Linn, M. C. (1988). Gender differences in verbal ability: A meta-analysis. Psychological Bulletin, 104, 53-69. Kimball, M. M. (1989). A new perspective on women's math achieve- ment. Psychological Bulletin, 105, 198-214. Linn, M. C., & Hyde, J. S. (in press). Trends in cognitive and psychoso- cial gender differences. In R. M. Leraer, A. C. Petersen, & J. Brooks- Gunn (Eds.), The encyclopedia of adolescence. New York: Garland Publishing. Linn, M. C, & Petersen, A. C. (1985). Emergence and characterization of sex differences in spatial ability: A meta-analysis. Child Develop- ment, 56, 1479-1498. Maccoby, E. E., & Jacklin, C. N. (1974). The psychology of sex differ- ences. Stanford, CA: Stanford University Press. Meece, J. L., (Eccles) Parsons, J., Kaczala, C. M., Goff, S. B., & Futter- man, R. (1982). Sex differences in math achievement: Toward a model of academic choice. Psychological Bulletin, 91, 324-348. Meyer, M. R. (in press). Gender differences in mathematics. In M. M. Lindquist (Ed.), Results from the fourth mathematics assessment of the National Assessment of Educational Progress. Reston, VA: Na- tional Council of Teachers of Mathematics. National Council of Teachers of Mathematics. (1988). Curriculum and evaluation standards for school mathematics. Reston, VA: Author. Ramist, L., & Arbeiter, S. (1986). Profiles, college-bound seniors, 1985. New York: College Entrance Examination Board. Rosenthal, R., & Rubin, D. B. (1982a). Comparing effect sizes of inde- pendent studies. Psychological Bulletin, 92, 500-504. Rosenthal, R., & Rubin, D. B. (1982b). A simple, general purpose dis- play of magnitude of experimental effect. Journal of Educational Psy- chology, 74, 166-169. Scott, W. A. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19, 321-325. Selkow, P. (1984). Assessing sex bias in testing. Westport, CT: Green- wood Press. Sells, L. W (1973). High school mathematics as the critical filter in the job market. In R. T. Thomas (Ed.), Developing opportunities for mi- norities in graduate education (pp. 37-39). Berkeley: University of California Press. Smith, M. L., & Glass, G. V. (1977). Meta-analysis of psychotherapy outcome studies. American Psychologist, 32, 752-760. Stage, E. K., Kreinberg, N., Eccles, J. R., & Becker, J. R. (1985). In- creasing the participation and achievement of girls and women in mathematics, science, and engineering. In S. S. Klein (Ed.), Hand- book for achieving sex equity through education (pp. 237-269). Balti- more: Johns Hopkins University Press. Zwick, R. (1988). Another look at interrater agreement. Psychological Bulletin, 103, 374-378. GENDER DIFFERENCES IN MATHEMATICS PERFORMANCE 153 Appendix Studies Used in the Meta-Analysis Alabama Department of Education. (1986-1987). [State mathematics assessment]. Personal communication: Rex C. Jones. American College Testing Program. (1987). State and national trend data for students who lake the ACT assessment. Iowa City, Iowa: Au- thor. Backman, M. E. (1972). Patterns of mental abilities: Ethnic, socioeco- nomic, and sex differences. American Educational Research Journal, 9, 1-12. Behrens, L. T., & Vernon, P. E. (1978). Personality correlates of over- achievement and under-achievement. British Journal of Educational Psychology, 48, 290-297. Bell, C., & Ward, G. R. (1980). An investigation of the relationship between dimensions of self concept (DOSC) and achievement in mathematics. Adolescence, IS, 895-901. Benbow, C. P., & Stanley, J. C. (1980). Sex differences in mathematical ability: Fact or artifact? Science, 210, 1262-1264. Benbow, C. P., & Stanley, J. C. (1983). Sex differences in mathematical reasoning ability: More facts. Science. 222, 1029-1031. Bennett, G. K., Seashore, H. G., & Wesman, A. G. (1979). Differential aptitude tests: Fifth edition manual. New \brk: Psychological Corpo- ration. Boli, J., Allen, M. L., & Payne, A. (1985). High-ability women and men in undergraduate mathematics and chemistry courses. American Ed- ucational Research Journal, 22,605-626. Brandon, P. R. (1985, April). The superiority of girls over boys in mathe- matics achievement in Hawaii. Paper presented at the 69th annual meeting of the American Educational Research Association, Chi- cago, IL. (ERIC Document Reproduction Service No. 260 906) Carrier, C., Post, T. R., & Heck, W. (1985). Using microcomputers with fourth-grade students to reinforce arithmetic skills. Journal for Re- search in Mathematics Education, 16, 45-51. Connecticut Department of Education. (1987). [State mathematics as- sessment data]. Unpublished raw data. Connor, J. M., & Serbin, L. A. (1980). Mathematics, visual-spatial abil- ity, and sex roles (Final Report). Washington, DC: National Institute of Education. (ERIC Document Reproduction Service No. 205 385) D'Augustine, C. H. (1966). Factors relating to achievement with se- lected topics in geometry and topology. The Arithmetic Teacher, 13, 192-197. Davis, E. J. (1973). A study of the ability of school pupils to perceive and identify the plane sections of selected solid figures. Journal for Research in Mathematics Education, 4, 132-140. Dees, R. L. (1982). Sex differences in geometry achievement. Paper pre- sented at the annual meeting of the American Educational Research Association, New York, NY. (ERIC Document Reproduction Service No. 215 873) De Wolf, V. A. (1981). High school mathematics preparation and sex differences in quantitative abilities. Psychology of Women Quarterly, 5, 555-567. Dick, T. P., & Balomenos, R. H. (1984). An investigation of calculus learning usingfactorial modeling. Paper presented at the 68th annual meeting of the American Educational Research Association, New Or- leans, LA. (ERIC Document Reproduction Service No. 245 033) Dossey, J. A., Mullis, I. V. S., Lindquist, M. M., & Chambers, D. L. (1988). The mathematics report card: Are we measuring up? (1986 National Assessment of Educational Progress Report No. I7-M-01). Princeton, NJ: Educational Testing Service. Edge, O. P., & Friedberg, S. H. (1984). Factors affecting achievement in the first course in calculus. Journal of Experimental Education, 52, 136-140. Educational Testing Service. (1987). A summary of data collected from Graduate Record Examinations test-takers during 1985-86. Princeton, NJ: Author. Engle, C. D., & Lerch, H. H. (1971). A comparison of first-grade chil- dren's abilities on two types of arithmetical practice exercises. School Science and Mathematics, 71, 327-334. Ethington, C. A., & Woffle, L. M. (1984). Sex differences in a causal model of mathematics achievement. Journal for Research
Compartilhar