10th Period: Testing the Boundaries: Part II - What Can Outliers Teach Us?

Monday, August 22, 2016

Testing the Boundaries: Part II - What Can Outliers Teach Us?

We've spent 12 years obsessing over two things: tests and achievement gaps. Yet it appeared that the efforts have been for naught, as the achievement gap has continued to grow.

However, there is some good news. The state's poorest districts saw the biggest percentage improvements in their test scores over those 12 years. The bad news is that improvement, while slightly narrowing the raw score performance gap between rich and poor districts, widened the state's relative performance gap. Meaning, it wasn't big enough to make a difference, relatively speaking, because high performing districts also significantly improved their scores.

You can see the problem in the following charts. Pay attention to what happens between the Suburban, very low student poverty and Urban, very high student poverty categories. You'll see impressive improvement among the poor category, which includes Ohio's so-called "Big 8" urban districts -- Akron, Canton, Cincinnati, Cleveland, Columbus, Dayton, Toledo and Youngstown. These are the districts state lawmakers have targeted for years as "failing" kids. Though, looking at the raw score improvement, one can't be anything but impressed by the 12-year jump.

Here's the problem: All districts improved, even the wealthiest districts. So despite the state's poorest districts' impressive improvement, all the other districts' improvement was enough to widen the relative performance gap so that the wealthiest districts made up a significantly higher proportion of top 10% mini-PI scores in 2014-2015, and the poorest districts made up a significantly higher proportion of the bottom 10% mini-PI scores.

But it cannot be ignored that Ohio's maligned, urban districts improved their math scores by more than 21%, and the improvement among all districts with high or very high poverty saw scores jump by 20% or more. That's impressive. But it wasn't enough to improve their status among Ohio's districts. And, in fact, they dropped in comparison with their wealthier brethren.

So what gives? Why did a 20% improvement lead to the widening of the achievement gap? A major reason is the upper limit of the mini-PI score. The upper limit of the PI score is 120. That's the maximum score any district could receive. And while some districts get close, getting a perfect 120 is probably impossible. So in the 2003-2004 school year, the average PI score for the OGT math score in the state's wealthiest, suburban districts was 102.84 -- a mere 18.16 points from perfection. So the greatest percentage improvement they could have was a bit more than 17%. The average for the state's poorest, urban districts was 68.26 -- 51.84 points from perfection. So the greatest percentage improvement they could have was 76%.

In the intervening years, the wealthiest districts went up to an average of 112.25 -- an amazingly close 7.75 points from perfection, or more than 1/2 of their maximum percentage improvement. And the state's urban districts went to 82.26 -- less than 1/3 of their maximum percentage improvement.

So what this meant is more suburban districts performed at an elite level, while the state's urban districts, which performed much better, remained mired in the bottom 10% of mini-PI Scores.

But this doesn't mean there aren't outliers. Take Lakota in Sandusky County -- classified by the state as a rural, poor district. In 2004, their OGT Math score placed them 511 of 608 districts. Not good. But in 2015, their score jumped so much that they are now in the top 1/3 of districts. Or Maplewood in Trumbull County. In 2004, they were in the bottom 1/2 of Ohio districts on OGT math scores. Now, they're in the top 5%.

In fact, the greatest rank jumpers in the state were the rural districts. They made up almost 1/2 of the greatest rank improvements.

Here's the other problem: They also made up about 40% of the greatest rank fallers too.

Tellingly, no urban district was in the top 10% of climbers or fallers.

Again, is there something interesting going on in the districts with the greatest improvements? Or something awful happening in the fallers? Or is it just statistical noise -- a true outlier? One would have to go to the district and see what, if any, change has led to the relative improvement. But given the predictive force of poverty on these scores, normal statistical variation would seem to explain much of this difference.

What does all this mean from a policy perspective? Well, it means that if improvement or performance slippage can be explained in large part by statistical variation, should we be granting kudos or shame to districts that grow or slip? It'd be easy to devise a method to reward districts that show dramatic improvement on scores, even if their relative performance to other districts remains low. Likewise, it can be easy to punish districts that slip. But should we, given what we know about statistical variation in these cases? There will always be outliers -- districts that out- or under-perform their demographics. But if there's one thing variation means it's that over or under performance could, in fact, be nothing more than yet another expected statistical result of standardized testing. Every district has their strong or weak classes pass through the system.

In other words, a district's score may not be indicative of the actual quality of their educational program. Yet, under our current accountability system, it is the dominant determinant of a school or district's quality.

But that doesn't mean these score changes mean nothing. Figuring out what they mean is a great challenge, but one that must be unlocked in order to understand how, if at all, they should inform our nation's education policy.