BELL CURVE: ANATOMY OF AN ANALYSIS    Site Map    Notes
---

Abstract

Claudia Krenz, Ph.D. (datafriend @ gmail-.-com)

This is a web page about a logistic regression, the first one interpreted 1 in The Bell Curve (Herrnstein & Murray, New York: The Free Press, 1994). There has been much discussion of the book but no examination of this or any other of its published analyses. The Bell Curve continues to generate discussion and influence U.S. social and educational policy. Its conclusions were based on the authors' interpretations of the results of 100 separate statistical analyses (logistic regressions). Although printouts from all analyses were included in one of the book's appendices, the authors were criticized for publishing them without having earlier submitted them to peer review.

Despite these--and many other--objections, there has been no investigation of the published statistical analyses on which the book's conclusions were based. The same simple statistical model was used throughout: 3 variables--scores on the Armed Services Qualifying Test (AFQT), a socioeconomic status index (SES), and AGE. This model was used to predict 100 different outcomes for different subgroups of the Bureau of Labor Statistics' venerable National Longitudinal Survey of Youth (NLSY). HM argue that AFQT scores are a measure of intelligence. Almost all their analyses were logistic regressions, including the one examined here, the first discussed in the book.

the ghost of gauss

This page examines the book's statistical output (Appendix 4, p. 622) and the authors' interpretation (Chapter 5, p. 127). In this analysis, HM used their 3-variable model to predict whether white NLSY cases lived above or below the POVERTY level. HM took AFQT and SES scores from the beginning of the NLSY and used them, along with their AGE covariate, to predict POVERTY a decade later. Murray made the book's data public long ago: Anyone with an internet connection can download them. The sample reported in the book's first analysis is obtained by excluding, from the higher-income NLSY X-Sectional subsample, all but white non-students--without missing values on any of the 4 analysis--AFQT, SES, AGE, & POVERTY--variables (N=3367). While HM used JMP, any commercially available statistical package can be used. Replication of the first analysis with STATA yielded numbers identical to those published. HM did not, however, examine their residuals; I did.

HM knew not to interpret their data analysis without examining the fit of their model--here, the 3 above mentioned variables. HM wrote that "the usual measure of goodness of fit for multiple regressions [is] R-square" (p. 617). Based on the R-square statistic, HM concluded their model adequately fit their data. Murray, however, subsequently learned otherwise, later referring to R-square as "ersatz and unsatisfactory" in this context (1995). Another way to examine the fit of a statistical model in a logistic regression is the classification table, an intuitively obvious cross-tabulation of the predicted status of cases by their actual status: here, a cross-tabulation of whether cases predicted by HM's 3-variable model to be above or below the poverty level actually were or were not.

After obtaining results identical to those published (down to four significant digits), the fit of HM's model was addressed by examining its classification table: HM's model predicted none of the cases living below the poverty level correctly (N=244): all were incorrectly predicted to be living above it. Interpreting the results of this first analysis is meaningless, because it predicts none of its cases of interest correctly: the Analysis section below shows my replication of HM's published analysis (and its corresponding classification table). The Subjects section describes the NLSY and the Data section, the variables used in the analysis.

What about the other 99? Concomitant with the rise of the theory of the underclass has been the dumbing down of America. Could it be that, since genius is not hereditary, that it is those now in power--the offspring of those in power when capital started imploding, power began consolidating, the playing field began tilting, becoming increasingly uneven--could it be that they--the rarely in doubt and rarely correct crowd--are making dumber and dumber decisions, and the tide is going out with them? Since the book's other 99 analyses used the same model as its POVERTY analysis, all of its interpretations are suspect.

distorted ruler

Site Map

Links to the 5 sections of this web page are highlighted and listed below: What variables did HM use? Which subjects? Did their model fit? Me (this web page being no less "theory laden" than any other human cognitive activity). Documentation (described in the left column below)?

Documentation
In this section, you'll find links to the data--programs I wrote to read them, compute new variables, and run analyses--as well as a bibliography. The sections to the right further describe my replication of HM's POVERTY analysis.
You can do exactly as I did: download the data and analyze them (made available by Murray, stumbled upon by me years ago (downloaded on the spot--3.5 MB was a commitment back then--from a still viable link). The internet has given us a new way of making knowledge public--data even more than software want to be free, the "entry fee" being in the present example a functioning statistical package and unfettered public internet access (not even an industrial strength search engine is needed because of URLs, links). Setting the level of access to knowledge low is in the public interest, e.g., the empirical history of how the microsoft corporation behaved is a matter of public record (for this you will need that search engine): words later used to characterize don't alter their having occurred: Anyone anywhere anytime can search on a phrase like "E pur si muove": "sharing data" in the decentralized public domain is in the public interest.

The commands--what to type to get the output with the same numbers HM published in New York City--are simple. Anyone with a stat package can look at its default printout for the kind of analysis conducted by HM, "logistic regression," and see the same numbers published on the first page of their Appendix 4 (down to four decimal points is close enough to agreement for me). HM are not liars. In Popperian terms, the statement "HM made up/conjured/fabricated/prevaricated/etc., their data" can be falsified: that I got the same numbers and so can you makes such knowledge public: no need for "journalists." Do it yourself, empirically, individually.

Setting the bar for access to knowledge low is in the public interest, as is setting the bar for what constitutes knowledge high, i.e., more rigorous, a point Platt (1964) made strongly--now online.

HM did misinterpret their data: You can establish this for yourself as easily--publicly and empirically--as the numerical replication of the first source table in their Appendix 4 discussed above. Just so, that Radio Netherlands observed that "radio tikrit" included an astrological forecast is a matter of fact, public record--but whether that meant that the U.S. was trying to convert those within its broadcast range to astrology would be a matter of interpretation.

HM were criticized for not having submitted their work for "peer review" before publication--not sharing it with other academic scientists--"peers" in the sense of substance not of being employed by U.S. academic institutions--as were Pons and Fleischmann, who announced "cold fusion" to some journalists in Salt Lake City, Utah [living 200 miles north at the time, I heard it said on local news: it's been an exciting life I've led]. Would HM's misinterpretation have been caught by the "peer review" process (I'd guess it'd depend on the journal-- as Platt noted, a "lifetime of achievement" in one discipline being equivalent to just a few years in one more rigorous)?

Given the prevalence of uncertainty--and being at least as smart as squirrels--humans have instituted epistemological processes like "peer review." An example is a medical journal editor noticing that the manuscript s/he's reviewing reports more "degrees of freedom" in the results section than it had "subjects" earlier and decides against publication. Varmus studied "peer review"--operationalized by extramural review--as an empirical process, using scores aggregated over reviewers and committees as empirical data. Sometimes "peer review" fails us, e.g., publication of methodologically flawed racist articles published in U.S. academic journals in the 1920s (or worse, in the former totalitarian u.s.s.r., when publication meant "biologist" Lysenko agreed with you--and if he didn't you'd be lucky to be just in jail). To the extent we forget that "peer review" is an empirical process, we risk "a return to the dogmatism of science of the middle ages and of a number of religions today" (Robertson, 1999). And I think the issue of public math and "peer review" unavoidable: the first interactive site I found online let anyone specify levels of variables like "prevalence" and "fertility" to compute outcomes like number of orphaned children.

By and large though epistemological behaviors like "peer review" serve us well, errors typically being, as in Student's title, "Errors of Routine Analysis"--positively skewed in terms of impact, i.e., most such errors are mundane. We must still though remain cognizant of the fact that a "peer reviewed" anything is a tautology, something reviewed by someone someone else calls "peer." Galileo's example underscores the importance of who constitutes such committees--putting the pet theory of Urban VIII, a man with the power to imprison him, into the words of "Simplicio," the simpleton in a three-way conversation does not strike me as accidental--perhaps Galileo questioned the authority of "belief" (requiring only "subjective certainty") as a litmus test for knowledge, what he had observed: "E pur se muove," nevertheless it moves.

The public internet has given us a new never-before-in-our-species form of decentralized countryless knowledge. That this page has been cited in books with titles like "quantitative genetics" and "multivariate analyses" suggests that others have used the same data I link to and gotten the same results I did (great textbook example: rare to get an unmessy number like 0). It's now less necessary to take anyone's word for anything--and that can't help but facilitate the growth of knowledge. I got numbers nearly identical to those HM published in The Bell Curve and, on examination, noticed that they meant nothing. You can follow the links to your right and get exactly the same results as I (and HM before). It is beyond the scope of the present page to address questions like whether The Bell Curve should be classified as an "urban legend" and about poverty (for that I refer the reader to Real Change, one of Seattle, WA's better papers, available online and on the street). It is the purpose of the present page to empirically examine one analysis, the first, the POVERTY analysis.

Variables
Where did HM get their data; what variables did they compute? This section describes how HM computed their variables. Note 1 discusses epistemological problems inherent in all human knowing and hence all social research, and Note 2 examines what HM had to do to create variables in JMP, their statistical package; Note 3 describes the characteristics of a bell or gaussian curve, and Note 4 shows some raw data.

Subjects
Who are the subjects in HM's analysis? This section examines the NLSY, shows how HM derived their white non-student X-Sectional analysis sample and identifies a parallel NLSY Supplemental sample. Note 1 compares the NLSY to other U.S. national longitudinal studies; Note 2 delineates the standard ways that HM dealt with the problems of missing values; and Note 3 empirically examines the distributions of HM's four variables in the two NLSY subsamples.

Analysis
I got the same numbers or results as HM. This section shows my replication--in the sense of duplicating the numbers published in the book--of HM's POVERTY analysis using the X-Sectional sample: the classification table from this analysis shows that their model predicted none of the cases living below the poverty level correctly. Note 1 shows the classification table resulting from replicating--in the sense of repeating a procedure--HM's model variables as predictors of POVERTY on an independent group of NLSY Supplemental(N=1067) subjects; this table shows that HM's model predicted a few cases living below the poverty level correctly. Since HM's statistical analysis says little--in the independent group replication--to nothing--in the published analysis--about cases living below the poverty level, interpretation of its model variable coefficients is unwarranted.

Analyst
Click on the link above to read about my situationally-grounded perspective: philosophy of measurement and bone density. Call the former "interpretative" and the latter "two standard deviations above the mean." Do I--female--have osteoporosis (I'm not as tall as I was--two inches less than the average one--and I have two "risk factors"--other variables correlating with having osteoporosis)? I think not, because, based on a standard normal distribution, the probability of a Type II error is very small. Less exciting reading is my resume--most things I do not know how to do: a few, yes. One of my favorite quotations is from Martin Heidegger: what's "most thought-provoking in these thought-provoking times is that we still are not thinking."

Footnotes

1 Although the subjects in this first analysis were whites only, the book's fame-notoriety came from its positing a relationship between "ethnicity" and intelligence ... Were we to substitute "nationality" for "ethnicity," the argument--keep in mind that test-score differences can be discussed without positing unnecessary constructs like intelligence --would go like this:

But, again, to conclude that Americans are less intelligent than other humans, we'd need to establish, were we being rational, that standardized math test scores were a good measure of intelligence (and dismiss repeated instances of doing dumb things with numbers as pathological) ... As with discussions of "nationality" and intelligence, so with discussions of "ethnicity" and intelligence: a link between the variable and the construct of interest must be established before interpreting one as an indicator of the other. This web page isn't about that debate. It's an empirical examination of a statistical analysis published in The Bell Curve, the first one.

3 Examining the impact of Kant's writings on all subsequent thinkers--including Kuhn, 1962, to whom all research experience-in-the-world is but a footnote--illustrates the "peer review" process. Kant credited Hume for awakening him --he'd been full prof for a decade--from his "dogmatic slumbers:" Kant synthesized Leibniz's rationalism and Newton's empiricism by placing the INDIVIDUAL human knower at the center of all knowledge. He outlined a model whereby a knowing mind might construe a world, arguing that understanding is a product of our perceptions and "reason" working together: The mind experiences nothing without perceptions--and has nothing to think about without "reason." In the preface to the second edition of his first Critique, Kant noted that "Reason approaches nature ... to be taught by it ... not in the character of a pupil who listens to everything that the teacher chooses to say, but of an appointed judge who compels the witnesses to answer questions which he has himself formulated ... (p. 20)."

He agreed with the rationalists that what is known through the senses is merely appearance and that reason plays a critical role in the knowledge chain--a point on which he disagreed with the empiricists, with whom he agreed that human knowledge is grounded in the perceptions of our physical senses and that the physical world--"the thing-in-itself"--is unknowable in the sense of unprovable (the problem of induction).

Kant separated "knowledge" from "belief," and "opinion:" the first requiring both objective and subjective certainty; the second, subjective certainty, and the third, neither. In so doing, his "critical philosophy" separated science and religion and placed ethics beyond revelation, convention, and outside authority.

Prominent rationalist and empiricist academic journal "peer reviews" of Kant's Critique of Pure Reason (1787) were scathing--several university towns banning his book as "subversive" ... But in less than a decade--it took just months for researchers to discredit "cold fusion" by communicating on USENET their many unsuccessful attempts to replicate P & F's results in their own labs--Kant's "critical philosophy"--what Palmquist (1996) called his "Copernican turn"--was taught throughout his area: students and recruiters from rival universities flocked to him, and some regarded him as a seer on matters irrelevant. As far as the latter, there are always those who set the bar of explanatory relevance so low that they believe in astrology, etc. As far as the former, Kant's peers had not been targeted and befuddled by advertising campaigns; no opinion polls had been taken; it wasn't because he was employed by the University of Köningsberg or because enron or u-haul math had occurred--and there are no questions of authenticity or verisimilitude; nothing magical or celestial happened: Kant's "peers" came to agree with him--"free will"?--because they thought him correct. In the centuries since--whether quantitatively measured in MB or qualitatively reflected in text like the title of one of Karatani's essays "On the Thing-in-Itself" and Heidegger's term "being-in-the-world"-- more has been written about him than he wrote. The insight of his Critiques--yes, three: Kant spoke for human equality: revolutions in America and France could not have escaped his notice--is that the everyday world consists of what experience is like as it happens, whether the experience is of someone conducting a statistical analysis or someone deciding which cabbage to pick. The corollary is that the reality we perceive is the only reality of which we can speak with certainty.