REPLICATION&FIT     Replication     Fit     Notes



I replicated the first Bell Curve statistical analysis, a logistic regression using a model--AGE, Armed Services Qualifying Exam (AFQT), and a socioeconomic index (SES) scores--to predict whether certain NLSY cases lived above or below the POVERTY level; HM's Appendix 4 contains JMP--their statistical software--output from this (p. 622) and other analyses. I read the variables "zAFQT," "zSES," "zAGE," and "POV89"--from the file NATION.TXT made available by Murray--in STATA, and, following the book (Appendix 2, pp. 593-604), excluded all but white non-students without missing values--and then empirically discovered that by also excluding all but NLSY X-Sectional(N=3367) cases, I got numbers identical down to four significant digits.

_another brick in the wall_ -Pink Floyd

This section shows my replication--in the sense of duplicating the numbers published in the book--of HM's POVERTY analysis; the classification table from this analysis shows that their model predicted none of the cases living below the POVERTY level correctly. Note 1 shows the classification table resulting from replicating--in the sense of repeating a procedure--HM's model variables as predictors of POVERTY on an independent group of NLSY Supplemental(N=1067) subjects; this table shows that HM's model predicted less than 10% of the subjects living below POVERTY correctly. Since HM's statistical analysis says little--in the independent group replication--to nothing--in the published analysis--about subjects in the two subsamples living below POVERTY, interpretation of its model variable coefficients (Chapter 5, p. 127) is unwarranted.


Table 2--created by pasting HTML tags into my STATA output (click here for documentation)--shows that I got the same numbers HM published (The Bell Curve, Appendix 4, p. 622). The variable coefficients (column 2), standard errors (column 3), and p--probability--values (column 5) in Table 2 agree with theirs down to four significant digits.

The only differences between Table 2 and HM's published output are superficial: My output showed "confidence intervals" and theirs didn't; mine listed z-scores (column 4) while theirs listed Chi-Squares: the square of my z-score equals their Chi-Square (e.g., for the AFQT, -8.958 × -8.958 = 80.25, the number HM published ).

Table 2. STATA output from replication--using the default prevalence rate, .5--of HM's Bell Curve analysis using the AFQT, SES, and AGE model to predict POVERTY for NLSY white non-student cases without missing values in the X-SECTIONAL(N=3367) subsample (original analysis: The Bell Curve, Appendix 4, p. 622) [data from the file NATION.TXT].

Logit Estimates                                         
Number of obs =   3367                                                        
chi2(3)       = 181.88                                                      
Prob > chi2   = 0.0000
Log Likelihood = -784.40179                             
Pseudo R2     = 0.1039

Variable Coefficient Standard Error z P>|z| [95% Confidence Interval]
zafqt89 -.8376652 .0935074 -8.958 0.000 -1.020936 -.654394
zses -.3300791 .0901006 -3.663 0.000 -.5066731 -.1534851
zage -.0238392 .0723743 -0.329 0.742 -.1656903 .1180119
_cons -2.648768 .076882 -34.45 0.000 -2.799454 -2.498082

HM interpreted the coefficient for the AFQT variable, -.84, being larger than that for SES, -.33, to mean that

low intelligence is a stronger precursor of poverty than low socioeconomic background. Whites with IQs in the bottom 5 percent of the distribution of cognitive ability are 15 times more likely to be poor than those with IQs in the top 5 percent (Chapter 5, p. 127).

HM's interpretation was based on the magnitude of these coefficients--and less quantifiable things including a) believing that "g" existed [contemporary psychometrics is not the psychometrics portrayed in the book], b) being relatively naive statistically [e.g., thinking R-Square a good measure of fit for their kind of analysis (p. 617), not looking at residuals from their analysis like the classification table (ye olde "inter-ocular" test), using their statistical package's vanilla prevalence default (.50) when the POVERTY rate in their sample was much lower (.07), and using a statistical model with only main effects, no interaction terms--particularly problematic given the high intercorrelation between AFQT and SES], and c) asking a confirmatory question [it's ever so much easier to confirm than to falsify what one already believes true].

Goodness of fit

topsy ruler A self-evident measure of the goodness of fit for this kind of analysis is the classification table, which crosstabulates predicted (based on the estimated probability for each case) by actual status, here HM's model predictions by NLSY POVERTY status: The top left and bottom right cells below show correctly classified cases (blue background); the bottom left and top right cells show incorrectly classified cases, false positives and false negatives. When a model fits the data, most cases--nothing's perfect--will be classified correctly and fall along the former diagonal.

OK.Cases below the POVERTY level classified below Error. Cases above the POVERTY level classified below
Error. Cases below the POVERTY level classified above OK.Cases above the POVERTY level classified above


Table 3 below (click here for documentation) crosstabulates the model's predictions (rows) by actual status (columns) ... This classification table was not part of STATA's default logistic output: I had to type "lstat" to get it. I suspect it wasn't part of JMP's default output either.


Table 3. Classification table: STATA output from replication of HM's Bell Curve analysis using the AFQT, SES, and AGE model to predict POVERTY [shown in Table 2] for NLSY white non-student cases without missing values in the X-SECTIONAL(N=3367) subsample [data from the file NATION.TXT].





BELOW 0 3 3
ABOVE 244 3120 3364
TOTAL 244 3123 3367

I suspect HM didn't realize that their model classified none of the 244 subjects living below the POVERTY level correctly (likely because they didn't look beyond their statistical package's default logistic output, the stuff of their Appendix 4) ...

Among life's few certainties is that HM's model says nothing about cases below POVERTY--and thus does not support their interpretation of their results, 0% correct being sufficiently precise to preclude equivocating. Their not realizing so is not astonishing. Doing dumb things with numbers is neither uncommon nor limited to interpretations of statistical analyses. 1

Table 4 replicates HM's analysis on that other group of NLSY non-student whites (click here for documentation). HM's model continues to underestimate the number of cases living below POVERTY, although not as grossly as in their published analysis: in this sample, HM's model predicts less than 10% of the cases living below POVERTY correctly, the "better" fit likely due to the POVERTY rate in this subsample being higher and thus closer to the default rate of their statistical package--15% compared to the 7% in the published subsample (thanks to Dr. Tienderen for pointing this out).

Table 4. Classification table: STATA output from replication of HM's Bell Curve analysis using the AFQT, SES, and AGE model to predict POVERTY in an independent group of white non-students without missing values in the NLSY Supplemental(N=1067) subsample [data from the file NATION.TXT].





BELOW 13 21 34
ABOVE 144 889 1033
TOTAL 157 910 1067
Getting a large number of false positives--cases living below POVERTY but classified by HM's model as living above--on a second independent group of subjects suggests there is no need to worry about unexplicableness like the distribution of the AGE variable or that the intercorrelation between AFQT and SES is higher than either separately with POVERTY.

The value of these two NLSY samples, beyond being independent, lies in their size, not any "representativeness." Both are large enough ( 3367 and 1067) for the AFQT and SES variables to be normally distributed--which provides a fair test of HM's model.

Having such a high percentage of false positive in two independent groups is consistent with HM's model not having any explanatory relevance or power. This could be because the variables don't adequately model the phenomena they purportedly explain. It could be due to poor conceptualization, e.g., that problem of discriminant validity. It could be because the statistical model used is so simplistic.

Whatever the reason, since HM's model says nothing about cases living below POVERTY, it's pointless to interpret its results. In summary, this web page describes the first logistic regression published in The Bell Curve, in which POVERTY status--either above or below an official level--was predicted by a model consisting of AFQT, SES, and AGE scores from a decade earlier. I replicated the numbers published in the book's Appendix 4. I then looked at how well the model fit: it didn't, not at all in the published sample (N=3367), not much in another independent one (N=1067). I suspect HM didn't realize they'd made a Type I error, that their model's predictive accuracy for cases living below the POVERTY level was 0%. Their model does not illuminate the problem of poverty, and their discussion likely does not address questions about other phenomena since the same 3-variable regression model was used throughout: that discriminant validity problem identified in the Subject's section is unlikely to disappear in other NLSY slices. This web page speaks only to the technical merit of The Bell Curve POVERTY analysis: it has none, other than as a cautionary tale.



1 My closest encounter went as follows: Imagine first data from a 180-day-long study of 19 subjects with herpes who had been randomly assigned to an initial condition (drug or placebo) and, on the first day of the study, taken 10 tests--empirical measures of standard social-research constructs like personality, locus of control, and dominance--and then daily reported their stress level and whether they thought the virus active, with the latter also clinically assayed (double blind) on a daily basis. Here, a chance to look at self-reported stress--what better source?--by clinically-confirmed viral activity--again, a gold standard: lesions (the subject knew) and shedding (s/he didn't) by stress. Imagine you decide to graph the data--and your SAS GRAPH plot for one subject, initially in the placebo condition, looked like this:

informative data plot

Imagine now someone insisting that intercorrelating a single stress score averaged over the entire 180-day study period with scores on the 10 initially administered tests--11 variables for 19 subjects (you can obtain the theoretically unobtainable, perfection, R-square = 1.0, by using a regression equation with as many variables as cases), doing so certain to inflate the Type I error rate--is more informative ... There is a world of reason--in the Kantian sense--and there are worlds of commerce and government in which people do dumb things with numbers. HM's interpretive error ranks IMHO as garden-variety mundane (its impact is another matter). Where you have humans, you're going to having people doing dumb--and smart--things with numbers. Everything's a footnote to Kuhn. back to text

Root Variables Subjects Analysis Analyst Documentation