Overview I replicated the first Bell Curve statistical analysis, a logistic regression using a modelAGE, Armed Services Qualifying Exam (AFQT), and a socioeconomic index (SES) scoresto predict whether certain NLSY cases lived above or below the POVERTY level; HM's Appendix 4 contains JMPtheir statistical softwareoutput from this (p. 622) and other analyses. I read the variables "zAFQT," "zSES," "zAGE," and "POV89"from the file NATION.TXT made available by Murrayin STATA, and, following the book (Appendix 2, pp. 593604), excluded all but white nonstudents without missing valuesand then empirically discovered that by also excluding all but NLSY XSectional(N=3367) cases, I got numbers identical down to four significant digits. 
This section shows my replicationin the sense of duplicating the numbers published in the bookof HM's POVERTY analysis; the classification table from this analysis shows that their model predicted none of the cases living below the POVERTY level correctly. Note 1 shows the classification table resulting from replicatingin the sense of repeating a procedureHM's model variables as predictors of POVERTY on an independent group of NLSY Supplemental(N=1067) subjects; this table shows that HM's model predicted less than 10% of the subjects living below POVERTY correctly. Since HM's statistical analysis says littlein the independent group replicationto nothingin the published analysisabout subjects in the two subsamples living below POVERTY, interpretation of its model variable coefficients (Chapter 5, p. 127) is unwarranted.
The only differences between Table 2 and HM's published output are superficial: My output showed "confidence intervals" and theirs didn't; mine listed zscores (column 4) while theirs listed ChiSquares: the square of my zscore equals their ChiSquare (e.g., for the AFQT, 8.958 × 8.958 = 80.25, the number HM published ). 
Table 2. STATA output from replicationusing the default prevalence rate, .5of HM's Bell Curve analysis using the AFQT, SES, and AGE model to predict POVERTY for NLSY white nonstudent cases without missing values in the XSECTIONAL(N=3367) subsample (original analysis: The Bell Curve, Appendix 4, p. 622) [data from the file NATION.TXT].
Logit Estimates Number of obs = 3367 chi2(3) = 181.88 Prob > chi2 = 0.0000 Log Likelihood = 784.40179 Pseudo R2 = 0.1039
Variable  Coefficient  Standard Error  z  P>z  [95% Confidence Interval]  

zafqt89  .8376652  .0935074  8.958  0.000  1.020936  .654394 
zses  .3300791  .0901006  3.663  0.000  .5066731  .1534851 
zage  .0238392  .0723743  0.329  0.742  .1656903  .1180119 
_cons  2.648768  .076882  34.45  0.000  2.799454  2.498082 
HM interpreted the coefficient for the AFQT variable, .84, being larger than that for SES, .33, to mean that
low intelligence is a stronger precursor of poverty than low socioeconomic background. Whites with IQs in the bottom 5 percent of the distribution of cognitive ability are 15 times more likely to be poor than those with IQs in the top 5 percent (Chapter 5, p. 127). 
A selfevident measure of the goodness of fit for this kind of analysis is the classification table, which crosstabulates predicted (based on the estimated probability for each case) by actual status, here HM's model predictions by NLSY POVERTY status: The top left and bottom right cells below show correctly classified cases (blue background); the bottom left and top right cells show incorrectly classified cases, false positives and false negatives. When a model fits the data, most casesnothing's perfectwill be classified correctly and fall along the former diagonal. 
OK.Cases below the POVERTY level classified below  Error. Cases above the POVERTY level classified below 
Error. Cases below the POVERTY level classified above  OK.Cases above the POVERTY level classified above 
Table 3 below (click here for documentation) crosstabulates the model's predictions (rows) by actual status (columns) ... This classification table was not part of STATA's default logistic output: I had to type "lstat" to get it. I suspect it wasn't part of JMP's default output either.  
Table 3. Classification table: STATA output from replication of HM's Bell Curve analysis using the AFQT, SES, and AGE model to predict POVERTY [shown in Table 2] for NLSY white nonstudent cases without missing values in the XSECTIONAL(N=3367) subsample [data from the file NATION.TXT]. 


I suspect HM didn't realize that their model classified none of the 244 subjects living below
the POVERTY level correctly (likely because they didn't look beyond their
statistical package's default logistic output, the stuff of their Appendix
4) ...
Among life's few certainties is that HM's model says nothing about cases below POVERTYand thus does not support their interpretation of their results, 0% correct being sufficiently precise to preclude equivocating. Their not realizing so is not astonishing. Doing dumb things with numbers is neither uncommon nor limited to interpretations of statistical analyses. ^{1} 
Table 4 replicates HM's analysis on that other group of NLSY nonstudent whites (click here for documentation). HM's model continues to underestimate the number of cases living below POVERTY, although not as grossly as in their published analysis: in this sample, HM's model predicts less than 10% of the cases living below POVERTY correctly, the "better" fit likely due to the POVERTY rate in this subsample being higher and thus closer to the default rate of their statistical package15% compared to the 7% in the published subsample (thanks to Dr. Tienderen for pointing this out).
Table 4. Classification table: STATA output from replication of HM's Bell Curve analysis using the AFQT, SES, and AGE model to predict POVERTY in an independent group of white nonstudents without missing values in the NLSY Supplemental(N=1067) subsample [data from the file NATION.TXT].

Getting a large number of false positivescases living below POVERTY
but classified by HM's model as living aboveon a second
independent group of subjects suggests there is no need to worry about
unexplicableness like the distribution of the AGE variable or that the
intercorrelation between AFQT and SES is higher than either separately
with POVERTY.

Whatever the reason, since HM's model says nothing about cases living below POVERTY, it's pointless to interpret its results. In summary, this web page describes the first logistic regression published in The Bell Curve, in which POVERTY statuseither above or below an official levelwas predicted by a model consisting of AFQT, SES, and AGE scores from a decade earlier. I replicated the numbers published in the book's Appendix 4. I then looked at how well the model fit: it didn't, not at all in the published sample (N=3367), not much in another independent one (N=1067). I suspect HM didn't realize they'd made a Type I error, that their model's predictive accuracy for cases living below the POVERTY level was 0%. Their model does not illuminate the problem of poverty, and their discussion likely does not address questions about other phenomena since the same 3variable regression model was used throughout: that discriminant validity problem identified in the Subject's section is unlikely to disappear in other NLSY slices. This web page speaks only to the technical merit of The Bell Curve POVERTY analysis: it has none, other than as a cautionary tale.
^{1} My closest encounter went as follows: Imagine first data from a 180daylong study of 19 subjects with herpes who had been randomly assigned to an initial condition (drug or placebo) and, on the first day of the study, taken 10 testsempirical measures of standard socialresearch constructs like personality, locus of control, and dominanceand then daily reported their stress level and whether they thought the virus active, with the latter also clinically assayed (double blind) on a daily basis. Here, a chance to look at selfreported stresswhat better source?by clinicallyconfirmed viral activityagain, a gold standard: lesions (the subject knew) and shedding (s/he didn't) by stress. Imagine you decide to graph the dataand your SAS GRAPH plot for one subject, initially in the placebo condition, looked like this: 
Imagine now someone insisting that intercorrelating a single stress score averaged over the entire 180day study period with scores on the 10 initially administered tests11 variables for 19 subjects (you can obtain the theoretically unobtainable, perfection, Rsquare = 1.0, by using a regression equation with as many variables as cases), doing so certain to inflate the Type I error rateis more informative ... There is a world of reasonin the Kantian senseand there are worlds of commerce and government in which people do dumb things with numbers. HM's interpretive error ranks IMHO as gardenvariety mundane (its impact is another matter). Where you have humans, you're going to having people doing dumband smartthings with numbers. Everything's a footnote to Kuhn. back to text
Root  Variables  Subjects  Analysis  Analyst  Documentation 