Claudia Krenz, Ph.D. (datafriend @ gmail-.-com)
This page summarizes the input-output model on which the educational accountability testing movement is based (high standards, a positive school climate, qualified teachers, and adequate resources lead to higher student achievement). It explicates how Education Week on the Web (EWW), a publication of the Fordham Foundation, operationalized or measured each of the four "inputs"--high standards, positive school climate, qualified teachers, and adequate resources--and the one "output," student achievement. This page focuses on the standards and achievement variables.
High standards were operationalized by the degree to which each state had implemented standards-based accountability testing (from the 1998 "report cards" EWW assigned each of the states. The standards data were cumulative, reflecting how many accountability tests the states were planning to or had implemented. For student achievement, EWW used the percent percent scoring proficient or higher on the 1996 administration of the National Assessment of Educational Progress (NAEP) math test, during the time period before the NAEP lost its scientific governance.
Here is a sorted list showing the degree of standards-based accountability testing implemented by state (high scores on this list indicate more accountability testing, i.e., more students were taking more state-based achievement tests in more subjects). Here is a sorted list showing the % proficient on NAEP 8th grade math by state (high scores indicate higher student achievement, i.e., more students scoring at the proficient level or above). Results on both lists for Alaska, California, and Texas are highlighted for comparative purposes: More Alaska students, for example, scored at the proficient level or higher than did their peers in Texas. Alaska students, however, took far fewer state-based accountability tests than did their peers in Texas.
Data between student achievement and standards--as well as the other 3 "inputs"--were correlated. The correlation coefficient between student achievement and standards-based accountability testing was -.39 on the NAEP 8th grade math test and -.20 on the 4th grade NAEP math test. These statistically significant correlations are in the opposite direction postulated by the input-output model.
Different interpretations of this coefficient are presented and discussed. It may be, for example, that implementing accountability testing has the opposite effect of that intended, i.e., that it leads to decreased student achievement. Or it may be that, since the achievement test data came from 1996 and the standards data from 1998, that lower student achievement leads legislators to mandate more accountability testing. The examples of Alaska--which only began implementing accountability testing in 1998--and Texas--which began its accountability testing programs in the early 1990s--support the former interpretation, as does the fact that the negative correlation for 8th graders--who would have been exposed to more accountability testing--is nearly double that for 4th graders.
The model shown in Figure 1 to the left below illustrates a commonly held view of student achievement: it increases as high standards, a positive school climate, qualified teachers, and adequate resources increase. Table 1 to the right below show statistically significant negative correlations between student achievement and implementation of standards-based accountability testing, -.39 (8th graders) and -.20 (4th graders)1 A negative correlation is opposite that expected by proponents of standards-based reforms, whose goal is increasing student achievement (something on which we all agree). This negative correlation raises the question of whether increased accountability testing is having its intended palliative effect: rather, the negative correlation suggests that increasing the time spent for preparing and taking accountability tests could be having an effect opposite its intent.
|
Figure 1. Model of
Factors or "Inputs" Thought to Facilitate Student Achievement.*
| |||||||
| Table 1. Pearson Product-Moment Correlations Between Model Variables: NAEP 1996 Math % 4th and 8th Graders Scoring Proficient Truncated to Two Decimal Points and EWW 1998 Grades for Each State (as published online in their State "Report Cards" Special Section). I used a UNIX script to extract and sort the data by state and then analyzed them with STATA (gr stand98, hist; corr ep_m4-stand98 using the raw data matrix in Appendix 3 below). | |||||||
| Achievement: NAEP Math (% proficient) |
"Inputs" (from EWW State Grades) | ||||||
| 4thGrade | 8thGrade | SResources | TQuality | SClimate | STesting | ||
| Achievement:4thGraders | 1.00 | ||||||
| Achievement:8thGraders | 0.90 | 1.00 | |||||
| SchoolResources | 0.28 | 0.17 | 1.00 | ||||
| TeacherQuality | 0.04 | -0.17 | 0.19 | 1.00 | |||
| SchoolClimate | 0.53 | 0.55 | 0.27 | -0.15 | 1.00 | ||
| StandardsTesting | -0.20 | -0.39 | 0.11 | 0.47 | -0.36 | 1.00 | |
* Input-output models of achievement have influenced U.S. public education since its inception (Callahan, 1962; Cremin, 1961; Nasaw, 1979; Wise, 1979). North Dakota, for example, one of the late adaptors 2 announced its new standards-based accountability testing program online, saying it would "assure businesses that prospective employees are well educated and prepared for the requirements of the business world." Standards-based reform, a phenomena of the last dozen plus years, is "results oriented," ostensibly focusing on the "output," achievement.
Other interpretations of this negative correlation include:
The relationship between legislated reforms and student achievement could be illuminated by using more recent NAEP scores, the problem being, as noted above, that the NAEP has has lost its scientific governance.
Correlations are, in any case, never conclusive, and the preceding--whatever its interpretation--is no exception. In summary, the empirical results--from intercorrelating the NAEP achievement with the EWW variables--shown in Table 1 raise a question about one of the assumptions of standards-based reform. Said results do not, of course, being, at best, correlation coefficients between two solid, substantive variables (NAEP math %proficient and the EWW standards variables)--"prove" anything. Perhaps we should, as did the the testing wing of the National Research Council when first convened in 1997 to discuss a national test, focus as much on minimizing potential risks and unintended consequences as on maximizing intended benefits. Myself, I'm watching for more data: these data suggest that one unintended consequence of the individual state implementations of accountability programs may has been to decrease achievement by reducing the amount of class time devoted to learning and thus students' opportunities to learn. Could it be that accountability testing is, like kudzu, strangling student learning?
1 Data. Achievement data are from the 1996 National Assessment of Educational Progress Math tests (see here for a crosstabulation of states by testing intervals: 1990, 1992, 1996, 2000, and 2003 (more recent NAEP data are not used here because, as with the Iraq pre-war intelligence, there are doubts, it is only sensible to doubt). Standards implementation data are from the "report card" Education Week on the Web (EWW) gave each state in 1998. EWW based its grades on the factors thought to enhance student achievement: implementation of accountability testing, i.e., having "standards" (and teacher quality, school resources, and school climate). EWW created its standards variable by summing the number of different standards-based reforms each state had implemented: 4 points for states administering diploma sanctioning tests, 2 for those that are planning them; 3 points for those administering accountability testing in reading, writing, and math--which is where all the states started--and an extra point for states with a social studies test, etc. The Appendices summarize how EWW conceptualized these "inputs" (and illustrates them with the grades provides it gave to Alaska and Texas). Appendix 1 shows how EWW conceptualized the phenomena of school resources, teacher quality, school climate, and standards or accountability testing implementation. Appendix 2 shows EWW's calculation procedures (and illustrates them with the examples of AK and TX). Appendix 3 shows the raw data from which the correlation matrix was generated are shown in EWW's methods section is quotes and illustrated quotes EWW's methods section and illustrates it with the "grades" EWW assigned to two states, AK and TX.
--
* In Table 1's correlation matrix, as expected, the highest
positive correlation, .90, is between the two achievement variables:
student achievement, expressed as a % proficient on a particular
standardized test, levels vary across states, i.e., the %s proficient
of 4th graders and 8th graders within states are more like each other
than they are like those in other states (eyeball the 2 4th and 8th
grade %s proficient columns across the 50 state rows in Appendix 3
to see for yourself). Also not surprising, the achievement and
"school climate" variables are also highly correlated. The low
correlations between the achievement variables and the resources and
teacher quality variables may be due the latter being noisy, messy
variables (scan EWW's computational procedures in Appendix 2 to judge
for yourself).
2 That the standards implementation variables is skewed suggests another possible explanation for the negative relationship between student achievement and legislator-imposed testing, i.e., although robust to violations, one of the assumptions of the general linear model is that variables be normally distributed.
| The plot to the right illustrates the degree to which the different states had implemented standards-based reforms--required more accountability testing--by 1998. The plot is "negatively skewed," meaning that, by 1998, most states had jumped onto the standards bandwagon.* |
Figure 2. STATA Plot of 1998 EWW Standards Variable **
|
In particular, by last century's end, most U.S. states had independently implemented "standards-based" reforms, specifying what students needed to master and testing them. Students were held accountable by having to pass "exit exams" to graduate from high school; schools were held accountable by being assigned "report cards" formed by aggregating their students' scores (ignoring those demographic characteristics with which the school is associated).
Just so, were we to aggregate WMD over gender--just like homebrew test scores aggregated over schools--we'd do it individual by individual, 1 meaning some kind of access, 0 meaning none whatsoever. Since we used, coded 0s and 1s , the sum of scores is equivalent to a percentage, the percentage of men and the percentage of women with access to WMD. Common sense says that the percentage of men w/ access to WMD will be much greater than the percentage of women (myself I don't know any such women--but there have to be some: otherwise would mean the impossible, division by 0). Would it be reasonable to explain this difference in proportions to mean that men have WMD "personalities," "streaks," "characteristics," traits?
|
| Early Adaptors By the early 1990s a dozen U.S. states had implemented such reforms. One such adaptor was Texas. Another was California--which, it is said, was legislatively divided on whether learning algebra was important-- jumped onto a "new" new-age math standard ... and onward into the academic cellar, giving rise to parental protests, one in the form of the Mathematically Correct Home Page ("yes, Virginia," it says, "there is a right answer"). Another early adaptor, Virginia mandated *abracadabra* that its statewide test results account for students who had earlier failed but, after remedial work, passed--by adding their scores to the numerator of its overall passing rate but not their number to its denominator, inexorably inflating it, theoretically to over 100% (Goldhaber, 2002). |
Late Adaptors By the late
1990s, only a few states had not jumped onto the bandwagon. Alaska
was a late adaptor, not though as late as that unflappable continental
phalanx constituted by Idaho, Montana, and North Dakota.
Alaska jumped aboard, because its legislature was
convinced by oil companies like Exxon, Tesoro, and Unocol
(headquartered in CA and TX)--that its students were insufficiently
educated to be "good" workers ...
|
3 The NCLB is up for renewal. Should it be renewed? I think not! The NCLB is up for renewal. Should it be renewed? I think not! The NCLB, like the Iraq War, had broad bipartisan support when signed into law in January 2001. It, like the Iraq War, was under-budgeted but, like the Iraq War, enhanced the pockets of some (test publishers for example) at the expense of others (the schools themselves). It, like the Iraq War, raised issues of sovereignty: who is in charge? No one, apparently.
Take one silly example, a press release, in which the current Secretary of Education chastised Utah --which had, in 2005, openly challenged parts of the NCLB (unlike Alaska, which caved completely) with its House Bill 1001 --because the percentages of students passing its homebrew accountability test weren't identical to those on the federal NAEP test: As the two tests--federal and state--had not been standardized to each other in any way, the wonder would have been had the actual numbers (%s passing) been identical! There is no reason to expect better decisions from USED than from, say, FEMA (USED was though better prepared to have new schools across the country enroll children made refugees by hurricane Katrina [it began sending out letters to chief school officers on /url: 9/2/04]). |
[N.B. This appendix is a paste--with HTML "tags" added--from the online tables showing EWW's 1998 "input" items and grades for the 50 states (enumerated in note 3 above). Its sole purpose is didactic.]
The first EWW item within
each "input" category and its scoring
protocol (again using Alaska and Texas as examples) are discussed below.
For navigational ease, click on the grade or scoring link
within each bulleted discussion to go to Alaska's grade and the first item's scoring protocol [click on the light-colored
button there to get back here].
All the individual items EWW used --and their scoring protocols--merged with EWW's grades for Alaska and Texas---are shown in the four tables below.

These four tables paste all the individual items EWW posted for each "input" category (merged with its grades for Alaska and Texas). Each "input" table is followed by EWW's scoring protocols. The purpose of this appendix is to facilitate--by combining into one file what had been presented in five-- the reader's understanding of how EWW calculated these grades.
≠
|
Abbreviations Subject Areas E = English; M = Math; S = Science; SS = Social Studies.
School Level
Assessment Types
Other | ||||
| Alaska | Texas | |||
| ASSESSMENT 30% OF GRADE | ||||
| #1 How does the state measure student performance? (Fall 1998) | NRT | D
| CRT,WR,PRF | A + 3 pnts |
| #2 Which subjects are tested using assessments aligned to state's standards? (Fall 1998) | None | F | E,M,S,SS | A |
| STANDARDS 50% OF GRADE | ||||
| #3 Has the state adopted standards in the 4 core academic subjects? (December 1998) | Yes | A | Yes | A |
| #4 How clear and specific are the state's English/ language arts standards? (Fall 1998) | nada | 0 | EH | 2 |
| #5 How clear and specific are the state's mathematics standards? (Fall 1998) | EMH | 3 | EMH | 3 |
| #6 How clear and specific are the state's science standards? (Fall 1998) | n/a | 0 | EM | 2 |
| #7 How clear and specific are the state's social studies standards? (Fall 1998) | n/a | 0 | nada | 0 |
| 3/12 = | .25 | 7/12 = | .58 | |
| ACCOUNTABILITY 20% OF GRADE | ||||
| #8 Students must master 10th grade standards to graduate (Fall 1998) | Future | C | No | F |
|
#9 Did the state participate in the 1998 NAEP
exams?
|
No | F | Yes | A |
| #10 How does the state hold schools accountable for performance?(November 1998) | ||||
| Report Cards | Yes | A | Yes | A |
| Ratings | No | F | Yes | A |
| Rewards | No | F | Yes | A |
| Assistance | No | F | Yes | A |
| Sanctions | No | F | Yes | A |
Scoring protocols for the Standards & Assessment category in item order
1 PRF = A; CRT = B if aligned to
state goals; CRT = C if not so
aligned; NRT = D; none = F; 3 extra points for WR.
![]()
2 4 basic subjects = A, 3 = B, 2 = C, 1 = D, none = F
3 4 = A, 2-3 = B, 1 = C, under development = D, none &no
plans
= F.
#4 - #7 Used the AFT's "Making Standards Matter" (1998) to
content analyze the clarity and specificity of the
different standards. The total number of ratings were then divided by 12.
8 Yes = A. Future = C, No = F [2.5% of grade].
9 Yes = A, N = F [2.5% of grade].
10 Yes = A, No = F for each of the 5 ways of holding schools
accountable
[15% of grade].
|
Abbreviations BASIC = basic skills, PED = pedagogy, SUBJ = subject matter, PROG= state adopted teacher competency standards which hold teacher training programs responsible for them. | ||||
| Alaska | Texas | |||
| PERFORMANCE-BASED LICENSING SYSTEM 40% OF GRADE | ||||
| #1 State has adopted standards for new teachers (1998) | Program |
C
|
Yes | A |
| #2 State has assessment(s) to measure whether new teachers meet standards (1998) | BASIC | D | PED,SUBJ | A |
| #3 State requires and funds an induction program for new teachers (1998) | No | F | No | F |
| #4 State requires assessment of the classroom performance of new teachers (1998) | No | F | No | F |
| #5 Number of national-board-certified teachers (1998) | 7 | 7 | ||
| #6 State provides incentives for teachers to seek national board certification (1998) | ||||
| License portability | Nada | 0 | nada | 0 |
| License renewal | Nada | 0 | nada | 0 |
| Fee supports | Nada | 0 | nada | 0 |
| Pay supplement | Nada | 0 | nada | 0 |
| C | C | |||
| IN-FIELD 20% OF GRADE | ||||
| #7 % secondary teachers who hold a degree in the subject they teach (1994) | 64 | 64 | 51 | 51 |
| PROFESSIONAL DEVELOPMENT 20% OF GRADE | ||||
| #8 State requires time for professional development (1998) | Yes | A | Yes | A |
| #9 State provides professional-development opportunities (1998) | Yes | A | Yes | A |
| #10 State provides funds for local professional-development activities (1998) | No | F | Yes | A |
| TEACHER EDUCATION 20% OF GRADE | ||||
| #11 State requires an academic major for certification of secondary teachers (1998) | Yes | A | Yes | A |
| #12 State requires K-12 standards be used in teacher education (1998) | No | F | Yes | A |
| #13 % new graduates from NCATE-accredited institutions (1997) | 0 | 0 | 49 | 49 |
| #14 State requires early and varied field experiences prior to student teaching (1998) | Yes | A | Yes | A |
| #15 State has a student teaching requirement (1998) | Yes | A | Yes | A |
Scoring protocols for the Teacher Quality category in item order
1 Standards stating what new
teachers should know and do = A,
having
such standards but only using them to approve education schools or
relying
solely on education schools to see that teachers meet standards = C, no
formal
standards but working on it = D, no standards & no plans for any =
F.
![]()
2 Testing subject-matter and teaching knowledge = A, testing
only subject-matter = B, testing only teaching = C, testing basic skills =
D, nothing = F.
3 Requiring new teachers to participate in induction programs
including mentoring by experienced teachers = A, requiring induction
programs for
only some teachers or requiring but not funding programs = C, no
induction
program = F.
4 Tying assessment of classroom teaching to licensure = A, not
assessing classroom teaching or assessing it but not tying it to
licensure.
5 Not included in grade.
6 Providing at least one incentive = A, providing no incentives
= C (serious, this is what it said).
7 Score = the % of teachers holding degrees in their fields.
8 Districts which set aside days or accumulate credits for
professional development = A, no time = F.
9 Providing money state-level professional development = A, no
earmarked money = F.
10 Providing money for local professional development = A, no
earmarked money = F.
11 Requiring an academic major = A, requiring credit hours =
C,
no requirement = F.
12 Requiring teacher-preparation programs tied to K12
academic
standards = A, not doing so = F.
13 Score = the % of teachers from NCATE approved schools.
14 Requiring K-12 field experience prior to student teaching =
A, not requiring but it's always part of teaching training = C,
nothing
= F.
15 Requiring student teaching = A, not requiring but it's part
of teacher training = C, nothing = F.
| Alaska | Texas | |||
| CLASS SIZE 35% OF GRADE | ||||
| #1 % 4th graders in classes of 25 or fewer students (1996) | 64 | 64
| 97 | 97 |
| #2 % 8th graders in math classes of 25 or fewer students (1996) | 53 | 53 | 65 | 65 |
| STUDENT ENGAGEMENT 20% OF GRADE | ||||
| % of 8th graders in schools reporting that | ||||
| #3 Absenteeism is not a problem or is a minor problem (1996) | 80 | 80 | 71 | 71 |
| #4 Tardiness is not a problem or is a minor problem (1996) | 80 | 80 | 72 | 72 |
| #5 Classroom misbehavior is not a problem or is a minor problem (1996) | 64 | 64 | 62 | 62 |
| PARENT INVOLVEMENT 20% OF GRADE | ||||
| % of 8th graders in schools reporting that | ||||
| #6 Lack of parent involvement is not a problem or is a minor problem (1996) | 55 | 55 | 55 | 55 |
| #7 Majority of parents attend open-house or back-to-school nights (1996) | 79 | 79 | 58 | 58 |
|
# 8 Majority of
parents attend parent-teacher
conferences (1996)
| 78 | 78 | 49 | 49 |
| SCHOOL AUTONOMY 25% OF GRADE | ||||
| #9 State permits or requires site-based management of schools (1998) | Yes | A | Yes | A |
| #10 Statewide public school open-enrollment program (1998) | No | F | Limited | C |
| #11 State law allows charter schools (1998) | Yes | A | Yes | A |
| #12 How strong is the charter school legislation? (1998) | Weak | F | Strong | A |
| #13 State grants waivers of education regulations (1998) | Yes | A | Yes | A |
| SCHOOL SIZE (Ungraded) | ||||
| % high school students in schools of 900 or fewer students (1996) | 41 | 22 | ||
| % elementary students in schools of 350 or fewer students (1996) | 18 | 8 | ||
Scoring protocols for the School Climate category in item order
1 Score = the % of 4th-grade classes
with less than 25 students.
2 Score = the % of 8th-grade classes with less than 25
students.
3 Score = the % of 8th graders reporting absenteeism not a
problem.
4 Score = the % of 8th graders reporting tardiness not a
problem.
5 Score = the % of 8th graders reporting misbehavior not a
problem.
6 Score = the % of 8th graders reporting lack of parental
involvement not a problem.
7 Score = the % of 8th graders reporting parents attend
back-to-school nights.
8 Score = the % of 8th graders reporting parents attend
parent-teacher conferences.
9 Sites that permit site-based management = A, not permit =
F.
10 Enrollment anywhere = A, enrollment limited = C, no choice =
F.
11 Allow charter schools = A, not allow = F.
12 Strong charter school laws (as rated by the Center for
Educational Reform) = A, weak laws = F.
13 Grant waivers = A, no grant waivers = F.
| Alaska | Texas | |||
| ADEQUACY 33% of Grade | ||||
| #1 Education spending per student, adjusted for regional cost differences (1997) | $6,601 |
$6,601
|
$5,889 | $5,889 |
| #2 % change in inflation-adjusted education spending per student (1987-97) | -18 | -18 | 23 | 23 |
| #3 % of total taxable resources spent on education (1996) | 4.4 | 4.4 | 3.9 | 3.9 |
| ALLOCATION 33% of Grade | ||||
| #4 % of annual education expenditure spent on instruction (1996) | 56.6 | 56.6 | 61.4 | 61.4 |
| EQUITY 33% of Grade | ||||
| #5 Relative inequity in spending per student among districts (1995) | 31.9% | 31.9% | 12.5% | 12.5% |
Scoring protocols for the Resources category in item order
1 Per-pupil expenditure (PPE) was
adjusted
using the "geographic cost-of-education index" from the National Center
for Education Statistics.The benchmark for 1999 was $7,369. Each state's
adjusted PPE was divided by $7,369 to obtain the
number of points out of 100.
2The percent change in inflation-adjusted education spending
per student was calculated by subtracting each state's
inflation-adjusted 1987 PPE from its 1997 PPE and
dividing that difference by the inflation-adjusted 1987
PPE.
100 percent was given to states that increased per-student spending by
at least 20 percent over inflation; 85 percent, to those
who raised it 15 to 19 percent; 75 percent, to those were raised it
10 to 14 percent; 65 percent, to those who raised it 5 to 9 percent;
50 percent to those either keeping up with or spending up to
5 percent over inflation; 0 was given to those who did not keep up
with inflation.
3 Percent of total taxable resources spent on education was
calculated
by dividing the combination of a state's local and state-level education
revenues for 1995-96 by its gross state product for 1995.
Five percent of state wealth was used as a benchmark to define a perfect
score.
The percent of a state's wealth spent on education was then divided
by that benchmark to
assign each state a grade.
4Allocation: Percent of annual expenditures
spent on instruction refers
to spending directly related to the interaction between teachers and
students, such as teacher salaries and classroom supplies. The
following grading benchmarks were used: 70 percent or greater is an A; 69
to 69.9
percent is an A-minus; 68 to 68.9 percent is a B-plus; 66 to 67.9 percent
is a B; 65 to 65.9 percent is a B-minus; 64 to 64.9 percent is a C-plus;
62 to 63.9 percent is a C; 61 to 61.9 percent is a C-minus; 60 to 60.9
percent is a D-plus; 58 to 59.9 percent is a D; 57 to 57.9 percent is a
D-minus; less than 57 percent is an F.
5Equity: The relative inequity among districts in spending per
student
was calculated by Management Analysis and Planning Inc., or MAP, for
Quality Counts using the U.S. Census Bureau's F-33 database.
The measure used, the coefficient of variation, summarizes how widely
spending across a state's districts varies from the average per-pupil
spending within a state. We adjusted each district's spending to account
for its poor and special education students and the differing costs of
hiring teachers and purchasing supplies. We excluded districts with fewer
than 200 students from our calculations and assigned special weights to
nonunified districts.
We used the following grading benchmarks: 1 to 3.9 percent variation is an A; 4 to 4.9 percent variation, an A-minus; 5 to 5.9 percent variation, a B-plus; 6 to 8.9 percent variation, a B; 9 to 9.9 percent variation, a B-minus; 10 to 10.9 percent variation, a C-plus; 11 to 13.9 percent variation, a C; 14 to 14.9 percent variation, a C-minus; 15 to 15.9 percent variation, a D-plus; 16 to 18.9 percent variation, a D; 19 to 19.9 percent variation, a D-minus; 20 percent or greater variation, an F. A detailed, step-by-step description of the analysis may be obtained on our WWW site.
<--"
| |State | |96:NAEP Math | |98: EWW "Input" Grades | ||||
| | | |ep4_m96 | ep8_m96 | |res98 | tqual98 | sclim98 | stand98 |
| Alabama | 11 | 12 | 80 | 76 | 65 | 88 |
| Alaska | 21 | 30 | 59 | 69 | 70 | 67 |
| Arizona | 15 | 18 | 45 | 78 | 64 | 86 |
| Arkan | 13 | 13 | 82 | 73 | 72 | 71 |
| California | 11 | 17 | 42 | 85 | . | 80 |
| Colora | 22 | 25 | 45 | 83 | 70 | 72 |
| Connecticut | 31 | 31 | 84 | 93 | 83 | 78 |
| Delawar | 16 | 19 | 80 | 78 | 62 | 85 |
| Florida | 15 | 17 | 71 | 84 | 57 | 92 |
| Georgia | 13 | 16 | 76 | 81 | 64 | 89 |
| Hawaii | 16 | 16 | 67 | 76 | 52 | 60 |
| Idaho | . | . | 82 | 66 | . | 57 |
| Illinois | . | . | 72 | 69 | . | 83 |
| Indiana | 24 | 24 | 88 | 84 | 69 | 81 |
| Iowa | 22 | 31 | 81 | 73 | 79 | 39 |
| Kansas | . | . | 78 | 75 | . | 90 |
| Kentuck | 16 | 16 | 82 | 89 | 62 | 89 |
| Louisian | 8 | 7 | 69 | 86 | 62 | 80 |
| Maine | 27 | 31 | 95 | 75 | 82 | 79 |
| Marylan | 22 | 24 | 82 | 83 | 52 | 93 |
| Massachuset | 24 | 28 | 75 | 85 | 76 | 91 |
| Michig | 23 | 28 | 90 | 86 | 65 | 81 |
| Minnesota | 29 | 34 | 78 | 83 | 72 | 70 |
| Mississippi | 8 | 7 | 78 | 75 | 65 | 77 |
| Missour | 20 | 22 | 69 | 84 | 71 | 72 |
| Montana | 22 | 32 | 83 | 73 | 71 | 50 |
| Nebrask | 24 | 31 | 86 | 73 | 82 | 72 |
| Nevada | 14 | . | 70 | 75 | . | 86 |
| NewHampshire | . | . | 85 | 77 | . | 85 |
| NewJersey | 25 | . | 100 | 78 | . | 80 |
| NewMexico | 13 | 14 | 70 | 78 | 67 | 94 |
| NewYork | 20 | 22 | 84 | 85 | 63 | 95 |
| NorthCarol | 21 | 20 | 72 | 93 | 64 | 89 |
| NorthDakota | 24 | 33 | 73 | 72 | 83 | 52 |
| Ohio | . | . | 85 | 80 | . | 86 |
| Oklahoma | . | . | 78 | 92 | . | 70 |
| Oregon | 21 | 26 | 76 | 72 | 64 | 91 |
| Pennsylvania | 20 | . | 89 | 73 | . | 86 |
| RhodeIsland | 17 | 20 | 83 | 82 | 76 | 70 |
| SouthCarolin | 12 | 14 | 79 | 92 | 66 | 85 |
| SouthDakot | . | . | 68 | 72 | . | 75 |
| Tennessee | 17 | 15 | 65 | 81 | 64 | 68 |
| Texas | 25 | 21 | 86 | 78 | 76 | 88 |
| Utah | 23 | 24 | 75 | 74 | 60 | 72 |
| Vermon | 23 | 27 | 87 | 79 | 81 | 69 |
| Virginia | 19 | 21 | 75 | 83 | 68 | 92 |
| Washington | 21 | 26 | 73 | 76 | 60 | 77 |
| WestVirginia | 19 | 14 | 98 | 84 | 69 | 92 |
| Wisconsin | 27 | 32 | 91 | 83 | 79 | 84 |
| Wyoming | 19 | 22 | 56 | 66 | 74 | 73 |
Last updated for link rot 7/07
Allen, M. and Yen, W. Introduction to Measurement Theory. Belmont, CA: Wadsworth (1979)
Allen, N., Jenkins, F., Kulick, E., and Zelenak C. Technical Report of the NAEP 1996 State Assessment Program in Mathematics. Washington, DC: National Center for Educational Statistics (1997)
Association for Supervision and Curriculum Development Nonrandom Human Error in Testing (5 Aug 03)
ASR-CAS Joint Study Group. Making Valid and Reliable Decisions in Determining Adequate Yearly Progress. Washington, D.C.: Chief State School Officers (02)
Barton, P.B. Too Much Testing of the Wrong Kind in K-12 Education. ETS: Princeton, NJ (6 June 1999)
Berkowitz, H. Communities Must Step in for Kids,Anchorage Daily News (18 Nov 02)
Boruch, R. The Virtues of Randomness, EducationNext (Fall 02) , Trends in International Mathematics and Science Study, 1995, 1999, 2003
Bushweller, K. Teaching to the Test, American School Board Journal (Sept 1997)
Callahan, R. Education and the Cult of Efficiency. Chicago: University of Chicago Press (1962)
Campbell, D. and Fiske, D. Convergent and Discriminant Validity in the Multitrait-Multimethod Matrix. Psychological Bulletin, 56, 81-105 (1959)
Campbell, D. and Stanley. J. Experimental and Quasi-Experimental Designs for Research. New York: Houghton Mifflin (1966)
Cook, T. and D. Campbell Quasi-Experimental Design and Analysis Issues for Field Settings. Chicago: Rand McNally (1979)
Cremin, L. The Transformation of the School. New York: Vintage (1961)
Cronbach, L. J. Designing Evaluations of Educational and Social Programs. San Francisco: Jossey Bass (1982)
Digest of Educational Statistics, Scholastic Assessment/Aptitude Test Score Averages, by State: 1974-75 to 1994-95 (1998)
Dillon, S. Thousands of Schools May Run Afoul of New Law, New York Times (16 Feb 03)
Education Committee of the States. State NCLB Plan Links (03), Educational Accountability Under NCLB, and Educational Accountability Under NCLB Revisited
Elmore, R. Unwarranted Intrusion, EducationNext (Spring 02)
Figlio, D. Aggregation and Accountability, No Child Left Behind: What Will It Take? Conference, Fordham Foundation, Washington D.C. (Feb 02).
Folstein, M., Folstein, S. and McHugh, P. "MINI-MENTAL STATE:" A Practical Method for Grading the Cognitive State of Patients for the Clinician, Journal of Psychiatric Research, 12 (3), 189-198 (1975)
Fullerton, K., T for TexasTechnology Skills, Wired (4 Dec 1999)
Goldhaber, D. What Might Go Wrong with the Accountability Measures, No Child Left Behind: What Will It Take? Conference, Fordham Foundation, Washington D.C. (Feb 02)
Haney, W. The Myth of the Texas Miracle in Education, Education Policy Analysis Archives (19 Aug 00)
Hannaway, J. and McKay, S. School Accountability and Student Achievement: The Case of Houston EducationNext (Fall 01)
Hayes, W. L. Statistics. Fort Worth: Harcourt Brace, 5th Ed. (1994)
Henriques, D. and Steinberg, G. None of the Above, FairTest (20 May 01)
Hensley, W. Speech by Willie Hensley at Bilingual Conference. Bilingual Conference: Anchorage, AK (Feb 1981).
Herrnstein, R. and Murray, C. The Bell Curve. New York: The Free Press (1994)
Hoff, D. NAEP Weighed as Measure of Accountability, Education Week on the Web (8 Mar 00)
Test-Weary Schools Balk at NAEP, Education Week on the Web (16 Feb 00)
Indiana State Legislature. House Bill 246 (1897)
Innes, R. Message (16 June 02)
Jaeger, R. and Tucker, C. A Guide to Practice for Title 1 and Beyond. Washington, D.C.: Chief State School Officers (1998)
Jesness, J. Stand and Deliver Revisited: The untold story behind the famous rise -- and shameful fall--of Jaime Escalante, America's master math teacher. Reasononline (July 02)
Johnson-Lewis, M. Testing Head Start to Death, Black Commentator (20 Feb 03)
Joint Committee on Testing Practices (American Educational Research Association, American Psychological Association, National Council on Measurement in Education, Standards for Educational and Psychological Testing, Washington, DC (1999)
Kane, T., Staiger, D., and Geppert, J. Randomly Accountable, EducationNext (Fall 02)
Klein, S., Hamilton, L., McCaffrey, D., and Stecher, B. What Do Test Scores in Texas Tell Us? Santa Monica, CA: RAND (00)
Krenz, C. Alaska's HSGQE Web Resources Page (03)
Koret Task Force on K-12 Education. School Accountability. Hoover Institution (02)
Kuhn, T.S. The Structure of Scientific Revolutions. Chicago: University of Chicago Press, 2nd Ed. (1962)
Lynn, R. Utah Education, The Salt Lake Tribune (21 Nov 03)
McNeil, L. Sameness, Bureaucracy and the Myth of Educational Equity: The TAAS System of Testing in Texas, Hispanic Journal of Behavioral Sciences (00)
Meehl, P. Theoretical Risks and Tabular Asterisks: Sir Karl, Sir Ronald, and the Slow Progress of Soft Psychology. Journal of Consulting and Clinical Psychology, 46, 806-834. (1978)
Nasaw, D. Schooled to Order. New York: Oxford University Press (1979)
Neuman, S. Letter to Chief State School Officers. Washington, DC: USED (5 Dec 02).
Olson, L. NAEP Board Worries States Excluding Too Many From Tests, Education Week on the Web (19 March 03)
Olson, L. Board Acts to Bring NAEP In Line With ESEA, Education Week on the Web (29 March 02)
| Olson, L., Shining a Spotlight on Results: Quality Counts '99, Education Week on the Web, 18 (17) (1999) | Achievement |
Phelps, R. Celebrity Research
Platt, J. Strong Inference. Science (15 Oct 1964)
Rhodes, K. and Madaus, G. Errors in Standardized Tests: A Systematic Problem. National Board on Educational Testing and Public Policy Monograph. Boston, MA: Boston College, Lynch School of Education. (2003)
Sagan, C. The Fine Art of Baloney Detection. Chapter in The Demon-Haunted World, Science as a Candle in the Dark. New York: Random House (1996)
Scott, J. Let The Mock Trial Begin, EducationNews.org (27 Mar 03)
Stofflet, F., Fenton, R., and Straugh, T. Construct and Predictive Validity of the Alaska State High School Graduation Qualifying Examination: First Administration, Conference, American Educational Research Association, Seattle WA, Apr 01
Student. Errors of Routine Analysis. Biometrika, 19, 151-164 (1927)
Taylor, F. The Principles of Scientific Management. New York: Norton (1911)
Texas' Exit Exam Page
Thomas, C. "Education Vouchers Bode Well for Public Schools" (syndicated column), Anchorage Daily-News (19 May 03)
Wise, A. Legislated Learning. Berkeley: University of California Press (1979)
Winer, B.J., Statistical Principles in Experimental Design. New York: McGraw-Hill, 2nd Ed. (1971)
Winerip, M. Defining Success in Narrow Terms, New York Times (19 Feb 03)
add medical privacy
www.nochildleftbehind.gov/next/where/alaska.html (what the NCLB says about Alaska)
www.ed.gov/policy/elsec/leg/esea02/pg8.html#sec1308 (SEC. 1308, prohibition on creating a nationwide database except for "coordinating migrant education activities")
www.ed.gov/policy/elsec/leg/esea02/pg27.html#sec2304 (Section 2304, "Troops for Teachers" funding)
www.ed.gov/policy/elsec/leg/esea02/pg27.html#sec2307 (Section 2307, "Troops for Teachers" evaluation plan)
www.ed.gov/policy/elsec/leg/esea02/pg97.html#sec411 (Section 411, NAEP changed authorization)
www.ed.gov/policy/elsec/leg/esea02/pg112.html#sec9526 (SEC. 9526, "Nothing in this section shall be construed to ... prohibit the distribution of scientifically or medically true or accurate materials" means?? could it be related to the last two entries below?)
www.ed.gov/policy/elsec/leg/esea02/pg112.html#sec9531 (SEC. 9531, prohibition against a single nationwide database)
www.ed.gov/offices/IES
(E-gov statement
about sacking OERI and creating the "Institute of Education Sciences." 02)
www.ed.gov/about/reports/annual/2002report/obj.doc (Press Release about Evaluation, 02: a document that states that "advertising research" shows "continuous quality improvement" was definitely written by someone with an advertising background--and that means without a substantive methodological statistical one, written by someone who didn't know what they were talking about)
www.ed.gov/admins/tchrqual/learn/preparingteachersconference/whitehurst.html (Whitehurst, G. Teacher Preparation and Professional Development, White House Conference on Preparing Tomorrow's Teachers (02))
www.ed.gov/offices/OII/fpco (Summary of rules regarding student record privacy)