Feminist “Scores:” Their Impact on Psychological Testing

By Patricia J. Anderson

Psychological Assessment I (LAP 501), Spring 1997, Dr. Karen Jaffe

After testing nine thousand three hundred thirty seven people in his Anthropometric Laboratory in eighteen eighty four, Sir Frances Galton summarized his findings this way, “women tend in all their capacities to be inferior to men” (Lewin & Wild 1991, p. 582). Over a hundred years later, this erroneous type of belief still lies deeply internalized within the minds of many men and women.

In their essay, Miriam Lewin and Cheryl Ward include the findings of women psychologists whose findings dispute those of Galton. In eighteen ninety five, Mary Whiton Calkins and her student Cordelia Nevers repeated work done by Joseph Jastrow (a follower of Galton) on the “mental traits of sex” (Lewin & Ward 1991, p. 582). Calkins and Nevers results did not demonstrate female inferiority (Lewin & Ward 582).

Helen Thompson Woolley argued against the popular views of Darwin and Galton in her book, The Mental Traits of Sex (1903). A psychology student of Woolley, Leta Stetter Hollingworth also challenged male theories about women’s biological inferiority.

During the nineteen twenties work by Beth Wellman, Marie Skodak and Harold Skeels on intelligence testing had a vital impact by demonstrating the profound effect of social environment on supposedly static intelligence traits. Their work has been crucial to debunking sexist and racist thinking within psychological testing (Lewin & Ward 582).

Rhoda Unger and Mary Crawford (1992) also discuss Galton’s work. Galton measured things like reaction time, grip strength and height because he thought these were innate and were a mirror to intelligence (76). The only people who questioned Galton’s theories were women, but because of their perceived inferior status, were not heard when they posited that opportunity and life experience were involved in intelligence (Unger & Crawford 76). Despite future testing that revealed no differences in variability, brain structures or intelligence between the sexes, the belief in women’s inferiority lives on to this day (Unger & Crawford 76). From the vantage point of today, it’s relatively easy to see how racism and sexism among past researchers might have led them to find justification for labeling women and people of color inferior (Unger & Crawford 77).

Despite the not doubt hard work of these aforementioned feminists, I have not found their names cited in any of the numerous books on psychological testing that I have been reading. The textbook for this class Psychological Testing (1997) makes no mention of these early feminists work that challenged Sir Frances Galton’s findings. Unless these harmful and erroneous findings are actively challenged traditional assumptions of female inferiority are left to linger in our brains. If the works of these women were included in psychology programs, along with the history of men like Galton, today’s students would be much more enlightened about the issues of sex differences and testing. Teacher comment: These sex differences are not mentioned in our textbook.

Lewin and Ward have provided an update on the progress, or lack there of, that has resulted since the recent inclusion of women’s voices in the field of psychological testing. They specifically discuss the Strong Campbell Interest Inventory and the Minnesota Multiphasic Personality Inventory (MMPI).

There are many reasons why feminists have criticized psychological tests. One is that some measures discriminate against females. Some measures address things from the perspective of typical males in stereotypical male settings. Secondly, feminists have found that assumptions have been made (without adequate evidence) that women as a group have less of a particular characteristic if their scores were lower than men’s. Teacher comment: Like what? No one considers males as lacking in any way when they score less of a stereotypical female trait. A third reason for feminist criticism are traditional concepts about femininity and masculinity, masochism, violence and rape as variables (Lewin & Ward 582). If not for feminist inquiry, new concepts such as androgyny, sexual harassment, date rape, and the Rape Myth Acceptance Scale wouldn’t be in existence (Lewin & Ward 582-3). Fourth, feminists posit that operational definitions must be of concern if the originating conceptual definitions are questionable. They offer an example in which femininity was measured via a criterion group of thirteen gay males without first proving that gay males were validating examples of femininity. Lastly, feminists have thought that biases within tests and measures resulted in women being denied admission to schools, denied jobs, and were improperly diagnosed with mental illnesses, when their actual problems stemmed from oppressive environments (Lewin & Ward 583).

Women’s historical not measuring up to male standards on tests created by males, has been used to prove women’s lower status and to justify men’s higher status and power in society. Feminists refute the notion that the standards that we should all be directed to or measured against are those that come from dominant males. Some feminists would go so far as to say that “their way” held the possibility of being “… even better than, the stereotypical male way” (Lewin & Ward 1991, p. 583).

Lewin and Ward ask, “How can we approach truth”? (1991, p. 584). Surely truth cannot come only from male or female perspectives. Surely a combination that includes the experience and “knowing” of women can come closer to “truth” than historic masculinist models.

The most widely used psychological test is the MMPI. It’s most extensive revision the MMPI-2, was put out in nineteen ninety. Scale 5, Mf (masculine-femininity) was validated for femininity in nineteen fifty six by a criterion group of thirteen gay males. Original descriptions of the scale clearly attest to the fact that their attempt to measure “sexual inversion” was a failure (Lewin & Ward 1991, p. 585). However, this fact was less prominent in test manuals (Lewin & Ward 585).

Feminists had minimal success in effecting change within the MMPI. Only four of the sixty items on the Mf scale were deleted due to their offensive nature. New norm samples were drawn on the United States population to get new means, percentiles and scale score distributions. Despite these new means the basic Mf scale 5 was never validated by correlating to any type of criteria (Lewin & Ward 585). Among revisions made on the F scale of the MMPI, one was done because of sexist language (Rothke, et al 1994).

The new MMPI-2 now includes scales that can be used for both sexes; the Gm (masculine gender role scale) and the Gf (feminine gender role scale) were taken from items on the old scale. These new scales include only items that seventy percent of one sex respondents label true and no more than sixty percent of the other sex respondents agree. Items are scored for extremes only. Lewin and Ward give the example of how the question, “I like to read mechanics magazines” is scored: Because men split about half in agreeing with this item, men will get no point no matter how they answer this question; when a female answers false to this question gets point in favor of femininity due to the fact that seventy percent or more of women in the sample answered false (585-586). The authors rightfully question whether we can gauge femininity and masculinity in this manner (Lewin & Ward 586). Also couldn’t one be feminine and like to read mechanics magazines? Couldn’t a man be masculine without enjoying mechanics magazines? These measures serve to trivialize the meaning of both genders.

Face validity is also questionable as far as the meaning of the concepts rated. Women get positive femininity on items such as, “I like to talk about sex”, “I am worried about sex” when they answer false to these questions; males gets points for answering true to these. What do these concepts mean (Lewin & Ward 586)? Could it not be that men and women like to talk about and worry about sex?

Lewin and Ward call into question the manual’s explanation of characteristics used to gauge femininity and masculinity. The manual claims that males scoring highly feminine are likely to be sensitive, aesthetic, passive and may even have a low heterosexual drive in contrast to males who score low and are deemed to be aggressive, crude adventurous, reckless with narrow interests – no evidence is offered to substantiate these claims about these traits. Despite the fact that the authors of the MMPI-2 have admitted that the Mf scale is ambiguous, people who use the test may not know this (Lewin & Ward 586).

This leaves feminists to ask whether this test ought to be used to screen people looking for jobs. Employer bias could occur in either direction. Masculine men may be thought unlikely to be happy in a creative type job and a woman who scores high on femininity may be questioned as far as her ability to fire someone if she were in management. There is a class action suit pending in California against a department store that used the old MMPI as a hiring tool (Lewin & Ward 587).

The Mf (MMPI) scale isn’t a valid measure of sexual preference or of how masculine or feminine a person is. The fact that femininity was measured against responses from gay males speaks for itself (Lewin & Ward 587). The fact that test creators would even consider using gay men to measure women demonstrates the extent to which men are consistently used to develop “norms” that women are expected to measure up. The criterion also wrongly assumes that gay men are feminine.

According to Friedman, expert on the MMPI, Scale 5 (masculinity-femininity) of the MMPI-2 was originally used to detect homosexuality. Today it’s used to measure interest patterns (Friedman 1997). Friedman says that low scores on this scale reflect the fear of being cared for and the missing joy of being card for (1997). He used a fellow author and friend to demonstrate how a male can be married with kids, but also love fashion and shopping (Friedman 1997). Clearly his description of the Mf scale does represent evidence of feminist influence. His description gave me hope that, at least within psychology, stereotypical attitudes about gender are being challenged.

Friedman (1997) also says that the MMPI can detect men who would commit date rape. As a feminist, I immediately ask myself why the test hasn’t been used to weed out potential rapists? What use of testing could be more important than the protecting women from harm? If the test can detect date rapists, then the test could also be used to detect therapists who would take advantage of their clients by having sex with them. Since rape is about control and power, not sex, therapists in essence do rape clients when they have sex with them. Why isn’t this test being used to prevent harm? Teacher comment: Individual rights are protected on both sides.

One of the major causes cited for malpractice claims against therapists is sexual misconduct. Occurrence rates for sexual intimacies and harassment inflicted by therapists are shocking and have been increasing (Corey 1996, 74). Corey credits better reporting procedures and increased public awareness with these increases (1996). With shocking rates of sex occurring between therapists and clients wouldn’t it make sense to use a psychological test that could weed out potential client abusers?

Feminists have yet to make a dent in the way people are assessed for Post Traumatic Stress Disorder. Not a single study used women to develop the measures. Women rape or incest victims, or army nurses could have been used in the seventeen studies done with male combat veterans and prisoners of war (Lewin & Ward 587).

Lewin and Ward also talk about the Strong Campbell Interest Inventory as an example of where feminist critique has had a positive result. Here feminists asked whether women should be judged by what a typical male feels about his occupation. After studying sex bias in the inventory, the American Measurement and Evaluation in Guidance Commission found that the fourth edition had been much improved. The fourth edition was found to have only five out of two hundred and seven occupations lacking in samples taken from men and women (Lewin & Ward 588).

The evolution of the Standards for Educational and psychological Testing is evidence of the inclusion of women. It’s nineteen eighty five revision says that there is a concern about the role of testing in the attainment of social goals. New developments such as, gender specificity, cultural bias, validity generalizations, interpretations done via computer and scores flagged for those with disabilities are some cautions that were brought out (Lewin & Ward 589). The previous nineteen seventy four standards didn’t address gender issues; the nineteen eighty five revision does address differences in gender, bias of certain items and differential predictive measures. The Code of Fair Testing Practices in Education (1988) agrees with the nineteen eighty five revision, along with all majors testing organizations in education (Lewin & Ward 589).

Women score lower on the Scholastic Aptitude Test (SAT) than men, but these tests fall short of predicting how women will perform in college because women are known to get better grades in higher education (Unger & Crawford 87) . On the actual abilities gained from courses reflected on the tests girls continually attain higher grades (Unger & Crawford 93). As a result of women’s lower test scores they loose out on many scholarships; more than seven hundred and fifty organizations award scholarships according to a test score. Women also loose when they are wrongfully denied inclusion in gifted programs. These concrete losses are compounded by the fact that lower test scores effect women’s sense of confidence about their ability to succeed in school (Unger & Crawford 87).

The Educational Testing Service mandated a sensitivity review process in nineteen eighty. Its effect resulted in changes in the SAT Verbal tests to a more balanced referencing of males and females. Items that were thought to be related to being male or female specifically were dropped from scoring (Lewin & Ward 590-591). There has also been an increase in the number of women on test committees. In nineteen seventy-seventy one, there were only six percent women on the testing committee for the Graduate Record Exam (GRE), compared to twenty nine percent in nineteen ninety-ninety one. Other programs demonstrate moves in the same direction (Lewin & Ward 591).

Unger and Crawford (1992) explain that it’s language that has described women and men as “opposite sexes” (67). The journal Psychological Abstracts reported 16,416 articles on sex differences between nineteen sixty seven and nineteen eighty five. They claim that the differences thought to be discovered between women and men are rarely related to biology. They refer to sex differences as carrier variables in personal history and experience. Thus feminist psychologists describe differences found as gender-related (Unger & Crawford 1992 p. 67).

Feminist critique the very definition of gender-related differences, the problem of measuring them, and understanding the results due to issues of interpretation and values. Feminist researchers discovered that historically women’s unequal status was justified by differences documented as scientific facts. Finding new differences between the sexes neglects to explain the societal influences that led to the differences. Feminists argue that differences between women and men are far less actual commonalties (Unger & Crawford 67). The notion of statistical significance can be far removed from the practical meaning of the word significance. “… Statistical significance is not the same as importance” (Unger & Crawford 69).

Unger and Crawford use the image of looking through a microscope as way to explain how researchers perceive their hypothesis in terms of results. If the researcher gazes through the microscope and views what was expected the hypothesis is deemed correct. If the view down the microscope shaft is blurry or shows nothing, the methods are blamed, the procedures are tried again, instead of concluding that the hypothesis was wrong to begin with (Unger & Crawford 70).

Within studies of gender-related differences many times researchers have studied only one of the sexes and posited the results as difference between the sexes. Measuring only hormonal differences as they correlate with mood among women, and then saying that only females experience this phenomenon (Unger & Crawford 71). Feminists also question taking samples for many research studies from college students. These female and male students may have equal levels of formal education, but may differ greatly on the types of classes taken from the start of high school and will frequently be very different while in college. These differences may be crucial to women’s lives (Unger & Crawford 72).

My own experience serves to validate what Unger and Crawford have said. The Miller Analogies Test that had little face validity for me. I never had a college math, philosophy or literature course. I studied nursing, physical and social science and medical ethics. Despite the fact that I graduated with honors from my nursing associate degree program, held a 4.0 GPA (four point scale) in the rest of my baccalaureate studies, and had already obtained A’s on five graduate courses at DePaul University, I failed this test. Out of one hundred questions, I got twenty eight correct, placing my score within the twenty fifth to thirtieth percentile. I can now see that the courses I chose to study did indeed explain why its face validity prepared me for failing. Thanks only to my feminist education and due to the last ten years of excellence in undergraduate and graduate school, my self confidence was not affected by this apparent failure.

My experience validates the claim of Unger and Crawford that, “A valid psychology of gender difference must account for how individual experiences and situational variables interact with sex (Unger & Crawford 74). Even new sophisticated techniques of meta-analysis do not lead us to any conclusions about the causes of differences historically found in published studies (Unger & Crawford 75). Traits usually connected with people of color and women, when compared to those of the “reference group,” are less affirming and desirable. “… Separate but equal …” stratifications remains illusive (Unger & Crawford 77).

Most psychological research measures behaviors outside of their social environment, which feminists question as far as then extrapolating real world validity; taken out of context, objectivity faces a mirage (Unger & Crawford 98). According to testing specialist, Phyllis Teitelbaum, standardized tests are androcentric in their epistemology in that they fail to measure skills such as creativity, intuition, verbal and non-verbal communication, cooperatives, sensitivity and supportiveness, all of which reinforce the androcentric model’s values and way of seeing the world. When something’s not tested it’s less valued than items included on tests (Unger & Crawford 98).

Because of feminist inquiry some psychological tests have been revised for the better. Sexism is more likely to be challenged today because of the work of feminists, despite the failure to promote change on the MMPI-2 Mf scale. The field of psychology and testing will evolve slowly along with the increasing presence of women in the field (Lewin & Ward 593).

Work Cited

Anastasi, Anne., and Urbina, Susana. (1997). Psychological Testing. Prentice Hall: New Jersey.

Corey, Gerald. (1996). Theory and Practice of Counseling and Psychotherapy. Fifth Edition. Brooks/Cole Publishing Company: New York.

Friedman, Alan. (1997). Lecture on the MMPI Test. National-Louis University, Wheeling Campus. May 5.

Lewin, Miriam., and Wild L. Cheryl. (1991). “The Impact of the Feminist Critique on Tests, Assessment, and Methodology.” Psychology of Women Quarterly, 15, (pp. 581-596).

Rothke, Steven E., Friedman, Alan F., Dahlsrom, W. Grant., Greene, Roger L., Arredondo, Rudy., and Mann, Anne Whiddon. (1994). “MMPI-2 Normative Data for the F-K Index: Implications for Clinical Neuropsychological, and Forensic Practice.” Assessment. vol 1, number 1, pp. 1-15.

Unger, Rhoda., Crawford, Mary. (1992). Women and Gender: A Feminist Psychology. New York: McGraw-Hill, Inc.

Feminist “Scores:” Their Impact on Psychological Testing

Published by

trishandersonlcpc@yahoo.com