Grading the State of U.S. Education

Claudia Krenz, Ph.D. (datafriend @ gmail-.-com)

Abstract

This page summarizes the input-output model on which the educational accountability testing movement is based (high standards, a positive school climate, qualified teachers, and adequate resources lead to higher student achievement). It explicates how Education Week on the Web (EWW), a publication of the Fordham Foundation, operationalized or measured each of the four "inputs"--high standards, positive school climate, qualified teachers, and adequate resources--and the one "output," student achievement. This page focuses on the standards and achievement variables.

High standards were operationalized by the degree to which each state had implemented standards-based accountability testing (from the 1998 "report cards" EWW assigned each of the states. The standards data were cumulative, reflecting how many accountability tests the states were planning to or had implemented. For student achievement, EWW used the percent percent scoring proficient or higher on the 1996 administration of the National Assessment of Educational Progress (NAEP) math test, during the time period before the NAEP lost its scientific governance.

Here is a sorted list showing the degree of standards-based accountability testing implemented by state (high scores on this list indicate more accountability testing, i.e., more students were taking more state-based achievement tests in more subjects). Here is a sorted list showing the % proficient on NAEP 8th grade math by state (high scores indicate higher student achievement, i.e., more students scoring at the proficient level or above). Results on both lists for Alaska, California, and Texas are highlighted for comparative purposes: More Alaska students, for example, scored at the proficient level or higher than did their peers in Texas. Alaska students, however, took far fewer state-based accountability tests than did their peers in Texas.

Data between student achievement and standards--as well as the other 3 "inputs"--were correlated. The correlation coefficient between student achievement and standards-based accountability testing was -.39 on the NAEP 8th grade math test and -.20 on the 4th grade NAEP math test. These statistically significant correlations are in the opposite direction postulated by the input-output model.

Different interpretations of this coefficient are presented and discussed. It may be, for example, that implementing accountability testing has the opposite effect of that intended, i.e., that it leads to decreased student achievement. Or it may be that, since the achievement test data came from 1996 and the standards data from 1998, that lower student achievement leads legislators to mandate more accountability testing. The examples of Alaska--which only began implementing accountability testing in 1998--and Texas--which began its accountability testing programs in the early 1990s--support the former interpretation, as does the fact that the negative correlation for 8th graders--who would have been exposed to more accountability testing--is nearly double that for 4th graders.

The model and intercorrelations between the "output," student achievement, and the "inputs"--high standards, a positive school climate, qualified teachers, and adequate resources

The model shown in Figure 1 to the left below illustrates a commonly held view of student achievement: it increases as high standards, a positive school climate, qualified teachers, and adequate resources increase. Table 1 to the right below show statistically significant negative correlations between student achievement and implementation of standards-based accountability testing, -.39 (8th graders) and -.20 (4th graders)1 A negative correlation is opposite that expected by proponents of standards-based reforms, whose goal is increasing student achievement (something on which we all agree). This negative correlation raises the question of whether increased accountability testing is having its intended palliative effect: rather, the negative correlation suggests that increasing the time spent for preparing and taking accountability tests could be having an effect opposite its intent.


Figure 1. Model of Factors or "Inputs" Thought to Facilitate Student Achievement.*

pathways-to-achievement_model

Table 1. Pearson Product-Moment Correlations Between Model Variables: NAEP 1996 Math % 4th and 8th Graders Scoring Proficient Truncated to Two Decimal Points and EWW 1998 Grades for Each State (as published online in their State "Report Cards" Special Section). I used a UNIX script to extract and sort the data by state and then analyzed them with STATA (gr stand98, hist; corr ep_m4-stand98 using the raw data matrix in Appendix 3 below).
Achievement:
NAEP Math
(% proficient)
"Inputs"
(from EWW State Grades)
4thGrade 8thGrade SResources TQuality SClimate STesting
Achievement:4thGraders 1.00
Achievement:8thGraders0.901.00
SchoolResources0.280.171.00
TeacherQuality0.04-0.170.191.00
SchoolClimate0.530.550.27-0.15 1.00
StandardsTesting -0.20-0.390.11 0.47 -0.361.00

* Input-output models of achievement have influenced U.S. public education since its inception (Callahan, 1962; Cremin, 1961; Nasaw, 1979; Wise, 1979). North Dakota, for example, one of the late adaptors 2 announced its new standards-based accountability testing program online, saying it would "assure businesses that prospective employees are well educated and prepared for the requirements of the business world." Standards-based reform, a phenomena of the last dozen plus years, is "results oriented," ostensibly focusing on the "output," achievement.


The correlation coefficient between student achievement and standards-based accountability testing was -.39 on the NAEP 8th grade math test and -.20 on the 4th grade NAEP math test. These statistically significant correlations are in the opposite direction postulated by the input-output model. It may be, for example, that implementing accountability testing has the opposite effect of that intended, i.e., that it leads to decreased student achievement. Or it may be that, since the achievement test data came from 1996 and the standards data from 1998, that lower student achievement leads legislators to mandate more accountability testing. The examples of Alaska--which only began implementing accountability testing in 1998--and Texas--which began its accountability testing programs in the early 1990s--support the former interpretation, as does the fact that the negative correlation for 8th graders--who would have been exposed to more accountability testing--is nearly double that for 4th graders.

Other interpretations of this negative correlation include:

The relationship between legislated reforms and student achievement could be illuminated by using more recent NAEP scores, the problem being, as noted above, that the NAEP has has lost its scientific governance.

Correlations are, in any case, never conclusive, and the preceding--whatever its interpretation--is no exception. In summary, the empirical results--from intercorrelating the NAEP achievement with the EWW variables--shown in Table 1 raise a question about one of the assumptions of standards-based reform. Said results do not, of course, being, at best, correlation coefficients between two solid, substantive variables (NAEP math %proficient and the EWW standards variables)--"prove" anything. Perhaps we should, as did the the testing wing of the National Research Council when first convened in 1997 to discuss a national test, focus as much on minimizing potential risks and unintended consequences as on maximizing intended benefits. Myself, I'm watching for more data: these data suggest that one unintended consequence of the individual state implementations of accountability programs may has been to decrease achievement by reducing the amount of class time devoted to learning and thus students' opportunities to learn. Could it be that accountability testing is, like kudzu, strangling student learning?

Footnotes

1 Data. Achievement data are from the 1996 National Assessment of Educational Progress Math tests (see here for a crosstabulation of states by testing intervals: 1990, 1992, 1996, 2000, and 2003 (more recent NAEP data are not used here because, as with the Iraq pre-war intelligence, there are doubts, it is only sensible to doubt). Standards implementation data are from the "report card" Education Week on the Web (EWW) gave each state in 1998. EWW based its grades on the factors thought to enhance student achievement: implementation of accountability testing, i.e., having "standards" (and teacher quality, school resources, and school climate). EWW created its standards variable by summing the number of different standards-based reforms each state had implemented: 4 points for states administering diploma sanctioning tests, 2 for those that are planning them; 3 points for those administering accountability testing in reading, writing, and math--which is where all the states started--and an extra point for states with a social studies test, etc. The Appendices summarize how EWW conceptualized these "inputs" (and illustrates them with the grades provides it gave to Alaska and Texas). Appendix 1 shows how EWW conceptualized the phenomena of school resources, teacher quality, school climate, and standards or accountability testing implementation. Appendix 2 shows EWW's calculation procedures (and illustrates them with the examples of AK and TX). Appendix 3 shows the raw data from which the correlation matrix was generated are shown in EWW's methods section is quotes and illustrated quotes EWW's methods section and illustrates it with the "grades" EWW assigned to two states, AK and TX.

--
* In Table 1's correlation matrix, as expected, the highest positive correlation, .90, is between the two achievement variables: student achievement, expressed as a % proficient on a particular standardized test, levels vary across states, i.e., the %s proficient of 4th graders and 8th graders within states are more like each other than they are like those in other states (eyeball the 2 4th and 8th grade %s proficient columns across the 50 state rows in Appendix 3 to see for yourself). Also not surprising, the achievement and "school climate" variables are also highly correlated. The low correlations between the achievement variables and the resources and teacher quality variables may be due the latter being noisy, messy variables (scan EWW's computational procedures in Appendix 2 to judge for yourself).

2 That the standards implementation variables is skewed suggests another possible explanation for the negative relationship between student achievement and legislator-imposed testing, i.e., although robust to violations, one of the assumptions of the general linear model is that variables be normally distributed.

The plot to the right illustrates the degree to which the different states had implemented standards-based reforms--required more accountability testing--by 1998. The plot is "negatively skewed," meaning that, by 1998, most states had jumped onto the standards bandwagon.* Figure 2.

STATA Plot of 1998 EWW Standards Variable **

states: standards reform 1998

In particular, by last century's end, most U.S. states had independently implemented "standards-based" reforms, specifying what students needed to master and testing them. Students were held accountable by having to pass "exit exams" to graduate from high school; schools were held accountable by being assigned "report cards" formed by aggregating their students' scores (ignoring those demographic characteristics with which the school is associated).

Just so, were we to aggregate WMD over gender--just like homebrew test scores aggregated over schools--we'd do it individual by individual, 1 meaning some kind of access, 0 meaning none whatsoever. Since we used, coded 0s and 1s , the sum of scores is equivalent to a percentage, the percentage of men and the percentage of women with access to WMD. Common sense says that the percentage of men w/ access to WMD will be much greater than the percentage of women (myself I don't know any such women--but there have to be some: otherwise would mean the impossible, division by 0). Would it be reasonable to explain this difference in proportions to mean that men have WMD "personalities," "streaks," "characteristics," traits?

In a 1673 satire, Le Malade Imaginaire, Moliere describes a medieval doctoral exam, where the candidate is asked to explain why opium puts people to sleep: "because, learned doctors, it has a dormative principle." Although the candidate has correctly identified the relation between two factors, no one with any common sense would conclude he's explained anything (Bateson was later to call this "reasoning by a dormative principle").

Or would it be more reasonable, more parsimonious, easier to explain that there are many complex socio-econo-ethno-governmental-geo issues which impact the phenomena of having access to WMD--that the gender variable is, more succinctly, "confounded" with the others.
Early Adaptors By the early 1990s a dozen U.S. states had implemented such reforms. One such adaptor was Texas. Another was California--which, it is said, was legislatively divided on whether learning algebra was important-- jumped onto a "new" new-age math standard ... and onward into the academic cellar, giving rise to parental protests, one in the form of the Mathematically Correct Home Page ("yes, Virginia," it says, "there is a right answer"). Another early adaptor, Virginia mandated *abracadabra* that its statewide test results account for students who had earlier failed but, after remedial work, passed--by adding their scores to the numerator of its overall passing rate but not their number to its denominator, inexorably inflating it, theoretically to over 100% (Goldhaber, 2002). Late Adaptors By the late 1990s, only a few states had not jumped onto the bandwagon. Alaska was a late adaptor, not though as late as that unflappable continental phalanx constituted by Idaho, Montana, and North Dakota. Alaska jumped aboard, because its legislature was convinced by oil companies like Exxon, Tesoro, and Unocol (headquartered in CA and TX)--that its students were insufficiently educated to be "good" workers ...

Two years earlier, on the National Assessment of Educational Progress (NAEP) 2 math test, the closest the U.S. had to a "gold standard," 30% of AK's 8th graders--compared to 21% of TX's and 17% of CA's--scored "proficient."

3 The NCLB is up for renewal. Should it be renewed? I think not! The NCLB is up for renewal. Should it be renewed? I think not! The NCLB, like the Iraq War, had broad bipartisan support when signed into law in January 2001. It, like the Iraq War, was under-budgeted but, like the Iraq War, enhanced the pockets of some (test publishers for example) at the expense of others (the schools themselves). It, like the Iraq War, raised issues of sovereignty: who is in charge? No one, apparently.

Take one silly example, a press release, in which the current Secretary of Education chastised Utah --which had, in 2005, openly challenged parts of the NCLB (unlike Alaska, which caved completely) with its House Bill 1001 --because the percentages of students passing its homebrew accountability test weren't identical to those on the federal NAEP test: As the two tests--federal and state--had not been standardized to each other in any way, the wonder would have been had the actual numbers (%s passing) been identical! There is no reason to expect better decisions from USED than from, say, FEMA (USED was though better prepared to have new schools across the country enroll children made refugees by hurricane Katrina [it began sending out letters to chief school officers on /url: 9/2/04]).


Appendices

Appendix 1: 1998 Ed Week Summary Grades for Alaska and Texas

As shown to the right, Alaska scored higher than Texas in 8th-grade achievement (I'm willing to call the 4th-grade math scores "randomly different" if you are). Texas scored at least one grade point higher than Alaska in all the EWW "input" categories except School Climate. Were we to interpret this table literally, we'd conclude that lax standards, lousy teaching, and negligible resources pave the way to high academic performance. This table, however, showing grades for only two states, does not form an adequate basis for generalization. Summary Grades Given to Alaska and Texas by Ed Week in 1999.

GRADE
Alaska Texas

Achievement

% 4th graders proficient NAEP math (1996)2125
% 8th graders proficient NAEP math (1996)3021

Standards and Assessment

Assessment (30% of grade); Standards (50%); Accountability (20%)67 D+88 B+

Teacher Quality

Performance-based Licensing system (40% of grade); In-field (20%); Professional Development (20%) Teacher Education (20%) 69 D+78 C+

School Climate

Class Size (35% of grade); Student Engagement (20%); Parent Involvement (20%); School Autonomy (25%) 70 C-76 C

Resources

Adequacy, Allocation, and Equity: each 33% of grade 59 F86 B


Appendix 2: 1998 Ed Week Items Predicting Achievement and their Scoring Protocols

[N.B. This appendix is a paste--with HTML "tags" added--from the online tables showing EWW's 1998 "input" items and grades for the 50 states (enumerated in note 3 above). Its sole purpose is didactic.]

The first EWW item within each "input" category and its scoring protocol (again using Alaska and Texas as examples) are discussed below. For navigational ease, click on the grade or scoring link within each bulleted discussion to go to Alaska's grade and the first item's scoring protocol [click on the light-colored back button there to get back here].

All the individual items EWW used --and their scoring protocols--merged with EWW's grades for Alaska and Texas---are shown in the four tables below.

wavy_ruler

These four tables paste all the individual items EWW posted for each "input" category (merged with its grades for Alaska and Texas). Each "input" table is followed by EWW's scoring protocols. The purpose of this appendix is to facilitate--by combining into one file what had been presented in five-- the reader's understanding of how EWW calculated these grades.

-->"

Standards and Assessment

Abbreviations
Subject Areas
E = English; M = Math; S = Science; SS = Social Studies.

School Level
H = high school; M, middle school, and E, elementary.

Assessment Types
PRF = Performance assessment, CRT = Criterion-referenced test, NRT = Norm-referenced test, WR = Writing assessment.

Other
N/A = document was not available for review, "nada" = a blank space in published table.

AlaskaTexas
ASSESSMENT 30% OF GRADE
#1 How does the state measure student performance? (Fall 1998)NRTD backCRT,WR,PRFA + 3 pnts
#2 Which subjects are tested using assessments aligned to state's standards? (Fall 1998)NoneFE,M,S,SSA
STANDARDS 50% OF GRADE
#3 Has the state adopted standards in the 4 core academic subjects? (December 1998)YesAYesA
#4 How clear and specific are the state's English/ language arts standards? (Fall 1998) nada0EH2
#5 How clear and specific are the state's mathematics standards? (Fall 1998) EMH3EMH3
#6 How clear and specific are the state's science standards? (Fall 1998) n/a0EM2
#7 How clear and specific are the state's social studies standards? (Fall 1998) n/a0nada0
3/12 =.257/12 = .58
ACCOUNTABILITY 20% OF GRADE
#8 Students must master 10th grade standards to graduate (Fall 1998) FutureCNoF
#9 Did the state participate in the 1998 NAEP exams? back NoF YesA
#10 How does the state hold schools accountable for performance?(November 1998)
      Report CardsYesAYesA
      RatingsNoFYesA
      RewardsNoFYesA
      AssistanceNoFYesA
      SanctionsNoFYesA

Scoring protocols for the Standards & Assessment category in item order

1 PRF = A; CRT = B if aligned to state goals; CRT = C if not so aligned; NRT = D; none = F; 3 extra points for WR. back
2 4 basic subjects = A, 3 = B, 2 = C, 1 = D, none = F
3 4 = A, 2-3 = B, 1 = C, under development = D, none &no plans = F.
#4 - #7 Used the AFT's "Making Standards Matter" (1998) to content analyze the clarity and specificity of the different standards. The total number of ratings were then divided by 12.
8 Yes = A. Future = C, No = F [2.5% of grade].
9 Yes = A, N = F [2.5% of grade].
10 Yes = A, No = F for each of the 5 ways of holding schools accountable [15% of grade].

Teacher Quality

Abbreviations
BASIC = basic skills, PED = pedagogy, SUBJ = subject matter, PROG= state adopted teacher competency standards which hold teacher training programs responsible for them.
AlaskaTexas
PERFORMANCE-BASED LICENSING SYSTEM 40% OF GRADE
#1 State has adopted standards for new teachers (1998)Program C back YesA
#2 State has assessment(s) to measure whether new teachers meet standards (1998) BASICDPED,SUBJA
#3 State requires and funds an induction program for new teachers (1998) NoFNoF
#4 State requires assessment of the classroom performance of new teachers (1998) NoFNoF
#5 Number of national-board-certified teachers (1998) 77
#6 State provides incentives for teachers to seek national board certification (1998)
      License portability Nada0nada0
      License renewal Nada0nada0
      Fee supports Nada0nada0
      Pay supplement Nada0nada0
CC
IN-FIELD 20% OF GRADE
#7 % secondary teachers who hold a degree in the subject they teach (1994) 646451 51
PROFESSIONAL DEVELOPMENT 20% OF GRADE
#8 State requires time for professional development (1998)YesAYesA
#9 State provides professional-development opportunities (1998) YesAYesA
#10 State provides funds for local professional-development activities (1998) NoFYesA
TEACHER EDUCATION 20% OF GRADE
#11 State requires an academic major for certification of secondary teachers (1998) YesAYesA
#12 State requires K-12 standards be used in teacher education (1998) NoFYesA
#13 % new graduates from NCATE-accredited institutions (1997) 004949
#14 State requires early and varied field experiences prior to student teaching (1998) YesAYesA
#15 State has a student teaching requirement (1998) YesAYesA

Scoring protocols for the Teacher Quality category in item order

1 Standards stating what new teachers should know and do = A, having such standards but only using them to approve education schools or relying solely on education schools to see that teachers meet standards = C, no formal standards but working on it = D, no standards & no plans for any = F. back
2 Testing subject-matter and teaching knowledge = A, testing only subject-matter = B, testing only teaching = C, testing basic skills = D, nothing = F.
3 Requiring new teachers to participate in induction programs including mentoring by experienced teachers = A, requiring induction programs for
only some teachers or requiring but not funding programs = C, no induction program = F.
4 Tying assessment of classroom teaching to licensure = A, not assessing classroom teaching or assessing it but not tying it to licensure.
5 Not included in grade.
6 Providing at least one incentive = A, providing no incentives = C (serious, this is what it said).
7 Score = the % of teachers holding degrees in their fields.
8 Districts which set aside days or accumulate credits for professional development = A, no time = F.
9 Providing money state-level professional development = A, no earmarked money = F.
10 Providing money for local professional development = A, no earmarked money = F.
11 Requiring an academic major = A, requiring credit hours = C, no requirement = F.
12 Requiring teacher-preparation programs tied to K12 academic standards = A, not doing so = F.
13 Score = the % of teachers from NCATE approved schools.
14 Requiring K-12 field experience prior to student teaching = A, not requiring but it's always part of teaching training = C, nothing = F.
15 Requiring student teaching = A, not requiring but it's part of teacher training = C, nothing = F.

School Climate

AlaskaTexas
CLASS SIZE 35% OF GRADE
#1 % 4th graders in classes of 25 or fewer students (1996) 6464 back
9797
#2 % 8th graders in math classes of 25 or fewer students (1996) 53536565
STUDENT ENGAGEMENT 20% OF GRADE
% of 8th graders in schools reporting that
      #3 Absenteeism is not a problem or is a minor problem (1996) 80807171
      #4 Tardiness is not a problem or is a minor problem (1996) 80807272
      #5 Classroom misbehavior is not a problem or is a minor problem (1996) 64646262
PARENT INVOLVEMENT 20% OF GRADE
% of 8th graders in schools reporting that
      #6 Lack of parent involvement is not a problem or is a minor problem (1996) 55 55 55 55
      #7 Majority of parents attend open-house or back-to-school nights (1996)79795858
      # 8 Majority of parents attend parent-teacher conferences (1996) back 7878 4949
SCHOOL AUTONOMY 25% OF GRADE
#9 State permits or requires site-based management of schools (1998) Yes A Yes A
#10 Statewide public school open-enrollment program (1998)NoFLimitedC
#11 State law allows charter schools (1998)YesAYesA
#12 How strong is the charter school legislation? (1998) WeakFStrongA
#13 State grants waivers of education regulations (1998) YesA YesA
SCHOOL SIZE (Ungraded)
% high school students in schools of 900 or fewer students (1996) 4122
% elementary students in schools of 350 or fewer students (1996) 188

Scoring protocols for the School Climate category in item order

1 Score = the % of 4th-grade classes with less than 25 students. back
2 Score = the % of 8th-grade classes with less than 25 students.
3 Score = the % of 8th graders reporting absenteeism not a problem.
4 Score = the % of 8th graders reporting tardiness not a problem.
5 Score = the % of 8th graders reporting misbehavior not a problem.
6 Score = the % of 8th graders reporting lack of parental involvement not a problem.
7 Score = the % of 8th graders reporting parents attend back-to-school nights.
8 Score = the % of 8th graders reporting parents attend parent-teacher conferences.
9 Sites that permit site-based management = A, not permit = F.
10 Enrollment anywhere = A, enrollment limited = C, no choice = F.
11 Allow charter schools = A, not allow = F.
12 Strong charter school laws (as rated by the Center for Educational Reform) = A, weak laws = F.
13 Grant waivers = A, no grant waivers = F.

Resources

AlaskaTexas
ADEQUACY 33% of Grade
#1 Education spending per student, adjusted for regional cost differences (1997)$6,601 $6,601 back $5,889$5,889
#2 % change in inflation-adjusted education spending per student (1987-97) -18-18 2323
#3 % of total taxable resources spent on education (1996) 4.44.4 3.93.9
ALLOCATION 33% of Grade
#4 % of annual education expenditure spent on instruction (1996) 56.656.6 61.461.4
EQUITY 33% of Grade
#5 Relative inequity in spending per student among districts (1995) 31.9%31.9% 12.5%12.5%

Scoring protocols for the Resources category in item order

1 Per-pupil expenditure (PPE) was adjusted using the "geographic cost-of-education index" from the National Center for Education Statistics.The benchmark for 1999 was $7,369. Each state's adjusted PPE was divided by $7,369 to obtain the number of points out of 100. back
2The percent change in inflation-adjusted education spending per student was calculated by subtracting each state's inflation-adjusted 1987 PPE from its 1997 PPE and dividing that difference by the inflation-adjusted 1987 PPE. 100 percent was given to states that increased per-student spending by at least 20 percent over inflation; 85 percent, to those who raised it 15 to 19 percent; 75 percent, to those were raised it 10 to 14 percent; 65 percent, to those who raised it 5 to 9 percent; 50 percent to those either keeping up with or spending up to 5 percent over inflation; 0 was given to those who did not keep up with inflation.
3 Percent of total taxable resources spent on education was calculated by dividing the combination of a state's local and state-level education revenues for 1995-96 by its gross state product for 1995. Five percent of state wealth was used as a benchmark to define a perfect score. The percent of a state's wealth spent on education was then divided by that benchmark to assign each state a grade.
4Allocation: Percent of annual expenditures spent on instruction refers to spending directly related to the interaction between teachers and students, such as teacher salaries and classroom supplies. The following grading benchmarks were used: 70 percent or greater is an A; 69 to 69.9 percent is an A-minus; 68 to 68.9 percent is a B-plus; 66 to 67.9 percent is a B; 65 to 65.9 percent is a B-minus; 64 to 64.9 percent is a C-plus; 62 to 63.9 percent is a C; 61 to 61.9 percent is a C-minus; 60 to 60.9 percent is a D-plus; 58 to 59.9 percent is a D; 57 to 57.9 percent is a D-minus; less than 57 percent is an F.
5Equity: The relative inequity among districts in spending per student was calculated by Management Analysis and Planning Inc., or MAP, for Quality Counts  using the U.S. Census Bureau's F-33 database. The measure used, the coefficient of variation, summarizes how widely spending across a state's districts varies from the average per-pupil spending within a state. We adjusted each district's spending to account for its poor and special education students and the differing costs of hiring teachers and purchasing supplies. We excluded districts with fewer than 200 students from our calculations and assigned special weights to nonunified districts.

We used the following grading benchmarks: 1 to 3.9 percent variation is an A; 4 to 4.9 percent variation, an A-minus; 5 to 5.9 percent variation, a B-plus; 6 to 8.9 percent variation, a B; 9 to 9.9 percent variation, a B-minus; 10 to 10.9 percent variation, a C-plus; 11 to 13.9 percent variation, a C; 14 to 14.9 percent variation, a C-minus; 15 to 15.9 percent variation, a D-plus; 16 to 18.9 percent variation, a D; 19 to 19.9 percent variation, a D-minus; 20 percent or greater variation, an F. A detailed, step-by-step description of the analysis may be obtained on our WWW site.

<--"


Appendix 3: NAEP Achievement Data and Ed Week 1998 Grades for the 50 States

[N.B. the matrix below combines data from 5 different online tables (URLs listed in note 3 above and bibliography below) that consisted of EWW's 1998 report cards for the 50 states: I sorted and merged data from the five EWW tables into one on the shell and analyzed the resulting file with STATA (paste these data into your stat package and type its syntax for corr ep_m4 - stand98 to get the coefficients shown in Table 1 above; plot stand98, h to get Figure 3)]
|State |96:NAEP Math |98: EWW "Input" Grades
||ep4_m96 ep8_m96 |res98 tqual98 sclim98 stand98
Alabama1112807665 88
Alaska2130596970 67
Arizona1518457864 86
Arkan 1313827372 71
California 11174285 . 80
Colora2225458370 72
Connecticut313184938378
Delawar1619807862 85
Florida1517718457 92
Georgia1316768164 89
Hawaii1616677652 60
Idaho . .8266 . 57
Illinois . .7269 . 83
Indiana2424888469 81
Iowa2231817379 39
Kansas . .7875 . 90
Kentuck1616828962 89
Louisian 8 7698662 80
Maine2731957582 79
Marylan2224828352 93
Massachuset2428758576 91
Michig2328908665 81
Minnesota2934788372 70
Mississippi 8 7787565 77
Missour2022698471 72
Montana2232837371 50
Nebrask2431867382 72
Nevada14 .7075 . 86
NewHampshire . .8577 .85
NewJersey25 . 10078 . 80
NewMexico1314707867 94
NewYork2022848563 95
NorthCarol2120729364 89
NorthDakota2433737283 52
Ohio . .8580 . 86
Oklahoma . .7892 . 70
Oregon2126767264 91
Pennsylvania20 .8973 . 86
RhodeIsland1720838276 70
SouthCarolin1214799266 85
SouthDakot . .6872 . 75
Tennessee1715658164 68
Texas2521867876 88
Utah2324757460 72
Vermon2327877981 69
Virginia1921758368 92
Washington2126737660 77
WestVirginia1914988469 92
Wisconsin2732918379 84
Wyoming1922566674 73

Bibliography

Last updated for link rot 7/07

Allen, M. and Yen, W. Introduction to Measurement Theory. Belmont, CA: Wadsworth (1979)

Allen, N., Jenkins, F., Kulick, E., and Zelenak C. Technical Report of the NAEP 1996 State Assessment Program in Mathematics. Washington, DC: National Center for Educational Statistics (1997)

Association for Supervision and Curriculum Development Nonrandom Human Error in Testing (5 Aug 03)

ASR-CAS Joint Study Group. Making Valid and Reliable Decisions in Determining Adequate Yearly Progress. Washington, D.C.: Chief State School Officers (02)

Barton, P.B. Too Much Testing of the Wrong Kind in K-12 Education. ETS: Princeton, NJ (6 June 1999)

Berkowitz, H. Communities Must Step in for Kids,Anchorage Daily News (18 Nov 02)

Boruch, R. The Virtues of Randomness, EducationNext (Fall 02) , Trends in International Mathematics and Science Study, 1995, 1999, 2003

Bushweller, K. Teaching to the Test, American School Board Journal (Sept 1997)

Callahan, R. Education and the Cult of Efficiency. Chicago: University of Chicago Press (1962)

Campbell, D. and Fiske, D. Convergent and Discriminant Validity in the Multitrait-Multimethod Matrix. Psychological Bulletin, 56, 81-105 (1959)

Campbell, D. and Stanley. J. Experimental and Quasi-Experimental Designs for Research. New York: Houghton Mifflin (1966)

Cook, T. and D. Campbell Quasi-Experimental Design and Analysis Issues for Field Settings. Chicago: Rand McNally (1979)

Cremin, L. The Transformation of the School. New York: Vintage (1961)

Cronbach, L. J. Designing Evaluations of Educational and Social Programs. San Francisco: Jossey Bass (1982)

Digest of Educational Statistics, Scholastic Assessment/Aptitude Test Score Averages, by State: 1974-75 to 1994-95 (1998)

Dillon, S. Thousands of Schools May Run Afoul of New Law, New York Times (16 Feb 03)

Education Committee of the States. State NCLB Plan Links (03), Educational Accountability Under NCLB, and Educational Accountability Under NCLB Revisited

Elmore, R. Unwarranted Intrusion, EducationNext (Spring 02)

Figlio, D. Aggregation and Accountability, No Child Left Behind: What Will It Take? Conference, Fordham Foundation, Washington D.C. (Feb 02).

Folstein, M., Folstein, S. and McHugh, P. "MINI-MENTAL STATE:" A Practical Method for Grading the Cognitive State of Patients for the Clinician, Journal of Psychiatric Research, 12 (3), 189-198 (1975)

Fullerton, K., T for TexasTechnology Skills, Wired (4 Dec 1999)

Goldhaber, D. What Might Go Wrong with the Accountability Measures, No Child Left Behind: What Will It Take? Conference, Fordham Foundation, Washington D.C. (Feb 02)

Haney, W. The Myth of the Texas Miracle in Education, Education Policy Analysis Archives (19 Aug 00)

Hannaway, J. and McKay, S. School Accountability and Student Achievement: The Case of Houston EducationNext (Fall 01)

Hayes, W. L. Statistics. Fort Worth: Harcourt Brace, 5th Ed. (1994)

Henriques, D. and Steinberg, G. None of the Above, FairTest (20 May 01)

Hensley, W. Speech by Willie Hensley at Bilingual Conference. Bilingual Conference: Anchorage, AK (Feb 1981).

Herrnstein, R. and Murray, C. The Bell Curve. New York: The Free Press (1994)

Hoff, D. NAEP Weighed as Measure of Accountability, Education Week on the Web (8 Mar 00)

Test-Weary Schools Balk at NAEP, Education Week on the Web (16 Feb 00)

Indiana State Legislature. House Bill 246 (1897)

Innes, R. Message (16 June 02)

Jaeger, R. and Tucker, C. A Guide to Practice for Title 1 and Beyond. Washington, D.C.: Chief State School Officers (1998)

Jesness, J. Stand and Deliver Revisited: The untold story behind the famous rise -- and shameful fall--of Jaime Escalante, America's master math teacher. Reasononline (July 02)

Johnson-Lewis, M. Testing Head Start to Death, Black Commentator (20 Feb 03)

Joint Committee on Testing Practices (American Educational Research Association, American Psychological Association, National Council on Measurement in Education, Standards for Educational and Psychological Testing, Washington, DC (1999)

Kane, T., Staiger, D., and Geppert, J. Randomly Accountable, EducationNext (Fall 02)

Klein, S., Hamilton, L., McCaffrey, D., and Stecher, B. What Do Test Scores in Texas Tell Us? Santa Monica, CA: RAND (00)

Krenz, C. Alaska's HSGQE Web Resources Page (03)

Koret Task Force on K-12 Education. School Accountability. Hoover Institution (02)

Kuhn, T.S. The Structure of Scientific Revolutions. Chicago: University of Chicago Press, 2nd Ed. (1962)

Lynn, R. Utah Education, The Salt Lake Tribune (21 Nov 03)

McNeil, L. Sameness, Bureaucracy and the Myth of Educational Equity: The TAAS System of Testing in Texas, Hispanic Journal of Behavioral Sciences (00)

Meehl, P. Theoretical Risks and Tabular Asterisks: Sir Karl, Sir Ronald, and the Slow Progress of Soft Psychology. Journal of Consulting and Clinical Psychology, 46, 806-834. (1978)

Nasaw, D. Schooled to Order. New York: Oxford University Press (1979)

Neuman, S. Letter to Chief State School Officers. Washington, DC: USED (5 Dec 02).

Olson, L. NAEP Board Worries States Excluding Too Many From Tests, Education Week on the Web (19 March 03)

Olson, L. Board Acts to Bring NAEP In Line With ESEA, Education Week on the Web (29 March 02)

Olson, L., Shining a Spotlight on Results: Quality Counts '99, Education Week on the Web, 18 (17) (1999) Achievement

Methodology

Resources

Standards and Assessment

School Climate

Teacher Quality

Platt, J. Strong Inference. Science (15 Oct 1964)

Rhodes, K. and Madaus, G. Errors in Standardized Tests: A Systematic Problem. National Board on Educational Testing and Public Policy Monograph. Boston, MA: Boston College, Lynch School of Education. (2003)

Sagan, C. The Fine Art of Baloney Detection. Chapter in The Demon-Haunted World, Science as a Candle in the Dark. New York: Random House (1996)

Scientifically Correct

Scott, J. Let The Mock Trial Begin, EducationNews.org (27 Mar 03)

Stofflet, F., Fenton, R., and Straugh, T. Construct and Predictive Validity of the Alaska State High School Graduation Qualifying Examination: First Administration, Conference, American Educational Research Association, Seattle WA, Apr 01

Student. Errors of Routine Analysis. Biometrika, 19, 151-164 (1927)

Taylor, F. The Principles of Scientific Management. New York: Norton (1911)

Texas' Exit Exam Page

Thomas, C. "Education Vouchers Bode Well for Public Schools" (syndicated column), Anchorage Daily-News (19 May 03)

Wise, A. Legislated Learning. Berkeley: University of California Press (1979)

Winer, B.J., Statistical Principles in Experimental Design. New York: McGraw-Hill, 2nd Ed. (1971)

Winerip, M. Defining Success in Narrow Terms, New York Times (19 Feb 03)

add medical privacy



nces.ed.gov/nationsreportcard/mathematics/results2003/stateachieve-g8 ("Percentage of students at or above Proficient in mathematics, grade 8 public schools: By state, 1990-2003") nces.ed.gov/ecls/pdf/Birth/dadMatrix.pdf
nces.ed.gov/ecls/pdf/birth/resDad24.pdf (9 and 24-month fatherhood ?aire)


www.nclb.gov (an NCLB home page)

www.nochildleftbehind.gov/next/where/alaska.html (what the NCLB says about Alaska)


www.house.gov/reform/tapps/hearings/3-21-02/deptofeducation.htm (THOMAS, USED testimony about the "Performance Based Data Management Initiative" at Congressional hearing, 02)
www.whitehouse.gov/omb/budget/fy2003/bud13.html
www.ed.gov/about/overview/budget/budget05/summary/edlite-section4.html (e-gov budgets, with the "Performance Based Data Management Initiative" being fully funded in both 03 and 04)


www.ed.gov/policy/elsec/leg/esea02 (ESEA main page; if this is the actual law, then it's date is 2001)

www.ed.gov/policy/elsec/leg/esea02/pg8.html#sec1308 (SEC. 1308, prohibition on creating a nationwide database except for "coordinating migrant education activities")

www.ed.gov/policy/elsec/leg/esea02/pg27.html#sec2304 (Section 2304, "Troops for Teachers" funding)

www.ed.gov/policy/elsec/leg/esea02/pg27.html#sec2307 (Section 2307, "Troops for Teachers" evaluation plan)

www.ed.gov/policy/elsec/leg/esea02/pg97.html#sec411 (Section 411, NAEP changed authorization)

www.ed.gov/policy/elsec/leg/esea02/pg112.html#sec9526 (SEC. 9526, "Nothing in this section shall be construed to ... prohibit the distribution of scientifically or medically true or accurate materials" means?? could it be related to the last two entries below?)

www.ed.gov/policy/elsec/leg/esea02/pg112.html#sec9531 (SEC. 9531, prohibition against a single nationwide database)


www.ed.gov/admins/lead/read/ereadingsbr03/ecreading03.pdf (E-gov's "Scientifically Based Research," a pile of links to pictorial slides, no bibliography, 03)

www.ed.gov/offices/IES
(E-gov statement about sacking OERI and creating the "Institute of Education Sciences." 02)

www.ed.gov/about/reports/annual/2002report/obj.doc (Press Release about Evaluation, 02: a document that states that "advertising research" shows "continuous quality improvement" was definitely written by someone with an advertising background--and that means without a substantive methodological statistical one, written by someone who didn't know what they were talking about)

www.ed.gov/admins/tchrqual/learn/preparingteachersconference/whitehurst.html (Whitehurst, G. Teacher Preparation and Professional Development, White House Conference on Preparing Tomorrow's Teachers (02))

www.ed.gov/offices/OII/fpco (Summary of rules regarding student record privacy)


web99.ed.gov/GTEP/Program2.nsf (appears as a link to an "Introduction from Secretary Rod Paige" but is really the following code: "javascript:_doClick('blahblah/$Body/0.5a68')," n.d. ... I do see a "doubleclick" in the preceding--which helps explain the idiocy of the press release above).