One of the contentious aspects of the Chicago teacher strike is the role of standardized testing. As is typical of the reform movement, the Chicago school district is pushing for even more standardized testing. Teachers are resistant for a variety of reasons. Research suggests that increased standardized testing does not improve student outcomes, although those results must still be considered preliminary. Time spent “teaching to the test”– an inevitable consequence of high-stakes testing– robs students and teachers of the most valuable educational resource. What’s more, it is a simple fact that more standardized testing leads to more cheating, fraud, and abuse, from students, teachers, and administrators alike. That is not a normative statement; it is an empirical statement. Finally, there is widespread anecdotal evidence of the extreme psychological and emotional costs that repeated high-stakes, high-pressure testing has on children.
I would just like to add an important element to this: we don’t need to test everyone every year to have an extremely accurate picture of how our students are performing. People say that we need to know where our students stand and if they’re improving. And indeed we do! But we can find that information without subjecting all of our students to stressful testing that takes away valuable class time and invites considerable negative washback. Appropriately stratified samples, carefully selected, can tell us what we need to know about districts, states, and the nation. And we can express the accuracy of that information with mathematical precision using statistics like standard error and confidence intervals.
As Dr. Stephen Krashen, Professor Emeritus in Education at the University of Southern California– someone whose expertise and credentials are beyond reproach– has said, “One function of such tests is to compare groups and investigate factors related to high achievement, which works if tests are valid and are low-stakes and thus do not encourage cheating. But we don’t have to test every child in every grade every year…. When you go to doctors, they don’t take all your blood; they take only a sample.” We have developed validity and reliability tests, measures of statistical error, and processes for accounting for that error precisely so that we don’t have to check everyone. And lest you think that checking everyone is necessarily more accurate than extrapolating from samples, that’s not the case, even if we assume the validity of the test in assessing its given construct.
I can only conclude two things: first, that people simply don’t understand the power and accuracy of inferential statistics to describe complex realities like student academic achievement; and second, that people resist extrapolation out of the impression that this it offers a less effective bludgeon with which to attack teachers.
Look, I consider myself a quantitative researcher, among other things, and I hope to publish on questions of language testing and assessment. I just this past week agreed to peer review for a major language assessment journal. I’m not opposed to testing entirely, not at all. But their limitations are real, their negative consequences empirically verified, and most importantly, their primary strength ignored when they are used on all the kids, all the time. For our data collection, testing all kids twice during their educational careers, as the gold standard NAEP tests do, in addition to targeted stratified samples that can be minimally intrusive, is more than enough. If the purpose of testing isn’t data collection, but rather having an instrument to assault teachers, well, that’s a total betrayal of our children and our educational system.