This blog is about 3 stories.

1. The start-up year for a very different sort of Graduate School of Education. It's a tiny subset of...
2. ...The much larger, national effort to transform teaching and teachers. That is a big subset of...
3. ...A multi-kajillion-dollar effort to improve the ludicrous odds (7% or so) of a poor kid ever getting a college diploma.

Measuring Schools

Posted: June 15th, 2012 | Author: | | 12 Comments »

I’ve been really enjoying the blogging of a guy named Matt DiCarlo. He blogs for the Shanker Institute.

Good one here on charter school research. On Apollo 20 tutoring/turnaround.

And recently on measuring schools. That’s what we’ll look at today.

He writes:

Roughly speaking, in addition to the inevitable measurement error, a school’s absolute performance level reflects a combination of two factors:

Students’ performance levels upon entry into the school;
Their improvement while attending the school.

Schools cannot control the former (which students they serve). It varies widely, and schools only serve students for a few years at most. They can, however, control the latter (whether students improve while enrolled).

So, why not just use the latter – growth – directly? Why would we hold schools accountable for an outcome that is largely out of their hands, when we have the option of isolating (at least approximately) the portion that they actually can control?

I agree with him. In Massachusetts, for example, the newspapers publish the absolute scores of schools. They’ve done that since MCAS began. Suburban superintendents like the absolute score list, by and large.

No Child Left Behind — remember that law that Ted Kennedy and George Bush passed — also requires states to public “subgroup data.” Black and Hispanic kids; poor kids; special ed kids; English Language Learners. How were all of those groups doing?

Some suburban superintendents up here that. What if all your white kids do well but your black kids do badly? That used to be swept under the rug. Now it was there in the newspaper, on a “NCLB Report Card.”

So the Massachusetts Association of Superintendents began lobbying the state to calculate MCAS growth scores, too. The idea was to show that the “subgroup kids” were making good progress, even if (of course) they had not reached the level of the middle class white kids.

Also, I’m told, they wanted to show that charter schools were not succeeding. They just took the good kids, that’s why they had high absolute scores. Growth data, they thought, would tell that tale.

The state agreed to the supes wish. So in 2010, they began to publish MCAS Growth Data.

Despite that, test score growth — the stuff a school controls — does not get very much news media coverage.

Why?

A few reasons.

1. Most “ranking” lists are absolute. Baseball standings. Wealthiest people.

True dat, but some lists are about growth, not absolute. Stock prices, for example. Typically they list “Big Gainers” and “Big Losers” on any given day or year. They never list stock prices in absolute terms. Otherwise Berkshire Hathaway would always be #1. This morning it cost $122,600 for a single share.

2. Who is the audience?

Newspaper readers are more likely to be interested in how their suburban school stacks up against the nearby suburban school.

3. What does the growth story tell us?

And then supes went silent. Crickets. Why?

a. Charters tend to quite high on the “MCAS Growth” list. In every grade, every subject, every year. Example, this happens to be Grade 5.

Not what supes had in mind.

Why? Some of that is actual quality.

(The Boston area charters in particular have done well. Less so of suburban Massachusetts charters, as measured by MIT economists. Remember, across the USA, the word “charter school” on average does not correlate with “high quality.” But in Boston, it does).

Some of the over-representation is just Stats 101: if you’re organization is small (fewer total children who take the test), it will be easier to be near the top or bottom of any particular list.

So look at the “Low Growth” list — the bottom ten. Again, over-representation of charters.

b. But there’s a bigger issue than charters.

Several suburban districts, with high absolute scores, have average growth scores.

That may irritate the very suburban superintendents who had been clamoring for public growth scores. Per DiCarlo, if top suburban districts in MCAS absolute scores are heavily the result of kids simply arriving to Grade K in good shape, it would rule out “brilliant superintendent” as the cause.

When the growth scores — the ones DiCarlo explains are what the schools actually control — are average, sometimes a superintendent cries foul.

For example, let’s examine this news article from a tony Massachusetts suburb:

The Winchester School District again shows why it is regarded as one of the top districts in the state. The school district received a very high performance rating from the state.

Two weeks ago the state released the MCAS scores and Winchester High School was the top school in the state in the English Language Arts Exam goes for percentage of students who scored advanced.

All good. High absolute scores. We tout that too in our school.

Now let’s look at another article, this one more probing:

“Winchester students are doing very well, certainly compared to the state averages,” said School Superintendant William McAlduff. “Our scoring trends, in almost every case, are mirroring the state level, but at a much higher rate.”

Now if the reporter knew about Growth scores, there’d be a logical follow-up question. Because growth is not a much higher rate.

Because of NCLB, however, the reporter was able to ask at least the “subgroup” question.

At the same time, Winchester struggled in meeting expectations for “adequate yearly progress” (AYP). The federal No Child Left Behind Act establishes guidelines for AYP, which requires schools to boost test scores each year.

Three Winchester schools—Ambrose and Lynch elementary schools, and McCall Middle School—failed to meet the state’s benchmarks for AYP. McAlduff said the failure wasn’t a poor reflection on the schools.

That is, it’s not our fault if the poor/minority kids don’t do well, it’s not our schools’ job, it’s the parents job.

Hmm. What if the supe had said this instead:

We’re proud of our kids’ absolute scores. Our parents send us well-educated kids and the teachers don’t mess anything up and the teachers do a good job from there.

But the growth data and the subgroup data show we need to get better. And we will work hard to do so.

The 2011 growth data across our whole district has us at the 54th percentile in English growth, and the 48th percentile in math growth. Doesn’t get much more average than that. We are safely average. We don’t want to be average.

The subgroup data tells us 3 stores.

1. We haven’t done a great job of helping the kids from poor families and other subgroups. They lag our other kids, by a lot.

Yet we only have about 20 such kids in any particular grade in the whole district. Ie, 19 poor kids in Grade 4, 23 in Grade 5, etc. Out of 300+ total kid per grade.

So….if each teacher would take responsibility to tutor just ONE kid at a high-dosage, we could probably get the poor kids to make huge progress and join their peers.

Moreover, if just the elementary teachers did this, kids from all income levels would arrive at the middle schools in decent shape.

2. The achievement gap in our schools does not close as kids spend more years with us. It seems to expand a bit, actually. The more time they spend in our schools. We should figure out why.

3. Our kids make slightly above average gains — compared to other suburbs — in English. But we make slightly below average gains in math. To improve, we plan to do X, Y, and Z.

Well, if he said that publicly, perhaps he’d stir up a hornet’s nest. But then there would be opportunity….


12 Comments on “Measuring Schools”

  1. 1: Tom Hoffman said at 10:45 am on June 15th, 2012:

    The problem is that past a certain point, this entire theory of change (test, publish the data, punish, etc.) is simply incorrect. Better data doesn’t lead to better results on a large scale.

    Not hypothetically, or in principle, but practically, based on observation and statistics.

    In particular, what is anyone actually supposed to conclude from the generation of increasingly abstract, indirect, and narrowly circumscribed data which purports to demonstrate that schools which are regarded by their communities and peers as successful are, in fact, not successful?

    I mean, as a parent. I’ve looked at the new RI growth data per school, and I’m happy the school my daughter will be attending has high growth in a high-poverty environment, but I’m not also thinking “God, I’m glad she’s not going to that nice suburban school with lower growth and much higher scores.” I’d take that too!

    Also, I’ll start caring more about subgroup performance within schools once schools are desegregated. What small slivers do within each school isn’t particularly useful.

  2. 2: Michael Goldstein said at 11:04 am on June 15th, 2012:

    Tom, good thought.

    Still, let me ask you this.

    1. What if your daughter was actually testing around the 15th percentile or so? (I’m guessing she’s not, with a such a well-read papa).

    Would you then care more about whether the school was high growth?

    2. More broadly, and less driven by parents, are there not districts that have improved how they help lower-performing kids precisely due to subgroup data and growth data? I don’t know the answer to that.

  3. 3: Jen said at 1:39 pm on June 15th, 2012:

    Yes and yes. The problem is that parents often don’t know what to make of growth data. In our district, scores are published yearly in a handy little newsprint booklet — but I may well be the only person who spends a lot of time looking and comparing — but it’s still not that easy to compare schools, from the booklet or on the website where they are reported unless you really want to.

    The growth scores are in a chart at the beginning of the booklet and are actually easier to compare — but hardly anyone seems to know they are there. If they did, they’d find out interesting things…

    For instance, my son’s magnet school in our district (58% Black, 32% White, 8% “multi-ethnic” and the rest “other,” and with a 75% free/reduced lunch rate which mirrors a 74% district rate — does pretty darn well (top 5 in both reading and math overall, only elementary top 5 in both). The growth scores show 1 and 2 standard deviation gains above district average for low-income and Black students and average gains for White kids (when the scores are high/bumping up near the upper limit it’s hard to show a lot of growth!)

    It’s when you compare it to other district schools that it’s shocking.

    Another school in the district is considered by many to be a “better school” (although when you see these numbers, I’m sad to say you’ll likely know the reason it’s considered better) certainly — its breakdown is 30% Black, 51% White, 9% Asian, 4% Hispanic and then other and (only) a 31% free/reduced lunch rate.

    That school has more than 60% of their Black kids scoring BELOW Proficient in grades 3-5 reading, while my son’s school has 27% below proficiency. For White kids, other school 11% below proficiency, my kid’s school 10%.

    Scores are similar, though a little closer in math. But no one seems to notice that while the white kids are faring well at both schools, one school is clearly doing a much better job than the other at educating their black kids.

  4. 4: Allison Jacobs Friedmann said at 9:10 am on June 16th, 2012:

    I think growth scores are tough because the MCAS quite frankly isn’t that hard. I know that it is harder than almost any other state test, but it is possible for a district to max out their scores. Then it is hard to show growth. My school gives a released 3rd grade MCAS at the end of 2nd grade. In math, my top quartile of kids scored 100%, 100%, 98%, 97% and 97%. It is almost impossible for me to make much growth with my top quartile of kids as a result. So I don’t look too much at growth statistics when I look at upper income, suburban schools.
    Like Jen, when we were thinking about moving to the suburbs (and when we eventually look at Boston Public schools), I look at subgroups. In nearly all of the suburban schools I looked at, low income kids and kids of color performed no better than they do in BPS. And upper income kids in Boston scored almost exactly the same as upper income kids in the suburbs. So I realized MCAS scores weren’t giving me that good a sense of how the schools were doing. Then Paul and I went out to look at actual schools. We didn’t think the teaching was any better in the suburbs than in the city. The level of discourse was higher, but the teaching was sub-par in most classrooms with a few standout great classrooms in each school. I was in one 45 minutes writers’ workshop in which a total of 5 minutes were spent on writing. Most kids didn’t write more than a sentence. That didn’t seem all that different to me than the city schools I have been to. And when the suburban teachers heard that I teach low income kids in the city, they gave me a sympathetic look and commented about how hard it is to teach METCO kids. There was a lot of blaming parents, home life etc.

    If we had wanted to have a bigger lawn or lower crime rates, it might have still made sense to move to the suburbs, but since we love the city and were only thinking about the suburbs because of schools, it just didn’t seem worth it.

    I did find one school in the suburbs that has a large number of low income kids and they outperform most schools in the state with those kids. And not surprisingly, there was some great stuff going on in that school. So when we look at city schools before entering the lottery, we will look for schools that help low income kids perform better than would be predicted by their SES. That’s where I think the really great teaching is happening.

  5. 5: Links 6/17/12 | Mike the Mad Biologist said at 4:44 pm on June 17th, 2012:

    [...] Adelson’s Mad Money Boston Dudley Do-Nothing Nobody loves standards (and that’s O.K.) Measuring Schools Share this:TwitterFacebookStumbleUponRedditDiggEmailPrintLike this:LikeBe the first to like this. [...]

  6. 6: joemac53 said at 5:30 pm on June 17th, 2012:

    Please understand a model of growth that plateaus for the last 20 per cent. It becomes increasingly difficult to get the last 20 per cent to whatever standard you want to measure. Growth models are never linear near the end.
    That said, if a school is committed to helping close an achievement gap, but that gap resists change, do you look for factors outside school that are causing the resistance?

  7. 7: Michael Goldstein said at 10:53 am on June 18th, 2012:

    Jen, thanks for the example. I have no idea how the Florida system works, but I wonder if this whole “school report card” letter grade thing draws attention to the type of thing you describe.

    Allison, great comment. What was that school by the way? Would love to turn you thoughts into a guest post.

    JoeMac, good question. Yes, I think schools do in particular try to “flip” some disengaged parents into being more involved.

    Also, great point on the final 20%. Although I’d add that there are growth models for single kids, single teachers, and whole schools.

    In MA, the whole school growth only compares one institution to the others. So if the best school only gets 70% of its kids over a particular bar, it would still be the 99th percentile here in “SGP.”

  8. 8: Ed Liu said at 4:55 pm on June 18th, 2012:

    I think you ALWAYS have to look at both growth and absolute performance levels. SGPs can be very misleading. You can have high sky SGP scores and still have very low absolute proficiency levels. Some of the schools that tout great progress with median SGPs of 80 still have a ways to go, when you realize that only a third of their students are actually scoring at proficient or above. Similarly, as Allison noted, a school where students are already scoring very highly, has little room to grow. Big differences in SGPs are based on very little actual variation in growth. A similar problem happens when you hear about medical or public health studies that tout that X increases the likelihood of death 100%. Sounds terrible, but without info about absolute level of risk, one can’t judge what that means. An increase of risk from 1 in 1,000,000 to 2 in 1,000,000 is a 100% increase. People need to know both figures to judge properly.

  9. 9: Allison Jacobs Friedmann said at 8:56 pm on June 18th, 2012:

    Mike- I’d be happy to have you repost my comment. Passing on the perspective that I gained when visiting suburban schools has been one of my missions this year.

    joemac- I think there has long been a debate about how to close the achievement gap: Do you fix the inequity in social services so that the lives of low income kids are more stable or do you try to educate kids who have unstable lives. Of course the answer is that you have to do both. I can tell you that universal health care in MA has made a huge difference for my kids. Chronic conditions are treated long term, practically eliminating middle of the night emergency room visits for conditions like asthma. My kids have glasses. It is huge for them. But societal change is slow. I don’t think we have time to wait for societal issues to be solved. There are kids who need to learn today. So I do what Mike talks about, trying to connect with families in ways that will allow them to support their children’s learning. I’ll hold a curriculum night in which I teach you math games you can play with your child. I’ll even give you all the supplies you need to play the game. And you can get a lot of parents to support their kids well. Can I tell a parent how to find time to play said math games when she is working double shifts at minimum wage jobs? No. And there are the parents whose lives are so unstable that they can’t possibly support their kids’ learning. So then the school has to do that job. We do the job of parent and school. And it can work for some kids. It is never quite as good as when the parent has the capacity to support their child’s learning because the child is never going to love anyone as much as their parent, but many kids can get through with the school doing 95% of the support. If schools CAN do that for kids, don’t they have the responsibility to do it?

  10. 10: Tom Hoffman said at 10:26 am on June 19th, 2012:

    1. What if your daughter was actually testing around the 15th percentile or so? (I’m guessing she’s not, with a such a well-read papa).
    Would you then care more about whether the school was high growth?

    On one hand I would want a school that has more specific interventions for struggling readers, but on the other hand, I’d think this would make me even more risk-averse, which would tend to make one flee to the suburbs.

    2. More broadly, and less driven by parents, are there not districts that have improved how they help lower-performing kids precisely due to subgroup data and growth data? I don’t know the answer to that.

    It certainly hasn’t happened as much as we hoped when NCLB was launched, but beyond that, it is a strategy for integrated schools and districts, and we mostly don’t have those anyhow.

  11. 11: Paul Schlichtman said at 11:28 pm on June 20th, 2012:

    Growth scores are just an indicator, but it’s a really good indicator. The scores do a very effective job at pointing us toward schools, grades, and classrooms that are performing really well or struggling to move their students forward.

    For school leaders, combining growth scores with achievement scores (including MCAS sub-scores), report card grades, and other data to get a sense of what is happening in a school.

    You mention that charters are more likely to be at the top or bottom of the list, and call that a statistical effect. That’s true, as the larger the number of students in the distribution, the smaller the variance of the mean percentile scores. Thus, a large district with a median growth score of 60 is statistically more unlikely than a small district with a median growth score of 60.

    The best way to look at charters and growth scores is to compare to other schools, rather than districts, for the statistical reasons you mention. When I look at district scores, I usually use a minimum of 300 students tested in order to have a valid comparison.

    I also agree that many suburban districts do not understand the meaning of growth scores, and if people knew how extreme a median growth score of 38 is, there would be much more concern about student outcomes.

    The new state school and district accountability system, which gives equal weight to growth and achievement, should inspire districts to take a more thoughtful and critical look at student achievement. Let’s hope so.

  12. 12: Paul Schlichtman said at 8:06 am on June 21st, 2012:

    Just to illustrate the distribution of median Student Growth Percentile scores in Massachusetts. This file shows the difference in the distribution between schools, all districts, and large (more than 300 students tested) districts.

    http://www.schlichtman.com/SGP_school_district_distribution.pdf


Leave a Reply