Can You and Me Share Student Test Scores?
Posted: December 4th, 2012 | Author: Michael Goldstein | | 9 Comments »
I have an idea. Many of my ideas are unworkable, and some are plain ol’ dumb. What do you think of this one?
A policy peril: efforts to fix “Bad teachers” sometimes harm “Good teachers.” Let me give you an example.
A. Teachers: Short-term vs Medium-term conflict
Teachers are increasingly held accountable, in part, by test score growth of students. Many states have plans to eventually rate teachers this way (although currently only a tiny fraction of teachers actually are right now, in places like Washington DC).
Here is a distortion that can arise. The short-term (what can plausibly be accomplished from September through April, before the state exam in May) competes directly against the “medium term” (what can be accomplished in 20 months — 2 school years).
I described last week the “competing responsibility” problem for elementary teachers in high-poverty schools. You’re supposed to teach “new stuff” — like the concept of “perimeter.” But you have a lot of kids who can’t even add or subtract.
The obvious way for a “whole school” to solve this is “first really fix the basics.* Then “teach the new concepts.”
But if you only have 140 math lessons before the state test, you can’t plausibly do both in a single year.
Often what happens is Grade 3 teacher, as an individual, teaches the new concepts (a list of standards), and — without a clear idea of the needed dosage to truly and fully fix the basic skill deficits — sort of “lobs in” some class time and a bit of after-school time to “make a dent” there.
What happens the following year?
Grade 4 teacher does the same. He teaches an updated list of new concepts, still lacking enough firepower to fully fix the basics.
B. Can this approach work well in math? Teach new ideas, because that lends itself well to “fixing as you go along?”
Some math people say yes. “New math” people.
I think they’re generally wrong. Not 100% wrong, but 70% wrong. “Jen” put it well in the Comments section a few days ago:
(T)here is NO depth or complexity or deep thinking possible without the basic knowledge. I’m not saying kids should do pages of worksheets, which is always how it gets painted (drill and kill). But if kids don’t do enough of the basics to really “get” addition vs. subtraction, for example, they are never going to be successful at deeper thinking tasks. They can’t be.
I also saw these same kids sans basic fact skills and basic thinking skills at the middle school level. I’d take problems back to the “is the answer going to be bigger or smaller than , bigger or smaller than before we even began trying the problem — otherwise they always just added or if they knew it was subtraction, they just subtracted using the “easiest” combination to subtract.
Early learning needs to include TONS of practice. Enough so that those basic facts are not requiring actual processing. Again, there are tons of games, activities, real-life situations, songs, etc. that can be used to do this, but it must be done.
C. Test Design Weaknesses Exacerbate the Short-V-Medium Term Tradeoff
I know the Massachusetts tests well, and not those of other states, so I’m not sure how pervasive this is….
Here some MCAS tests have a key design flaw: they lack a mix of easy and hard problems, whereas schools have many kids who don’t even know the easy ones.
Alas, getting struggling 4th graders to master easy stuff is not enough — by itself — to get even one additional problem right on the 4th grade MCAS.
If you have 140 arrows (math lessons from Sep thru April) in your quiver, and shoot half of them at “easy stuff,” you probably will neither solve the skill deficits (setting up kids for the future), and you’ll also do a lousy job with the 40% of your students who do have strong basic skills (because you won’t have “covered” the new concepts).
If you shoot all 140 arrows at basic stuff, you’re even more screwed.
So you shoot most of the arrows at “new concepts”, you can at least pretend that somehow you’re fixing the basic skill issues, and know for sure that the 40% of your students are indeed understanding the new material and will succeed on MCAS.
Bad options.
(D. Fixing The Test Problem)
This is not the point of this blog, but while I’m at it, let me explain how to fix this problem.
I.e., imagine a good scenario that doesn’t currently exist: a Grade 4 test that measured both
1) easy stuff like whether kids could add or subtract, or what a rectangle is,
and
2) harder stuff (in relative terms).
That way, if your 4th students don’t know the easy stuff, and you work hard as a teacher to diligently to remedy that, your kids’ MCAS scores will go up.
Good! As it should be. Then the teacher, kid, and state interests are all aligned. You’d appropriately get credit. You’d have set up the Grade 5 and 6 and 7 and 8 math teachers for future success. Down the road, they’d get to teach kids who’d (at last) mastered the easy stuff, and instead could focus on harder stuff (typically the “grade-level standards.”) You’d stop “passing along the problem” to new teachers.
E. Policy Idea (Assuming The Flawed Tests Remain): Can Teachers Ask To Tag Team?
Could 2 sequential teachers — for example, Grade 3 and Grade 4 — “request to co-own responsibility for the 4th grade math MCAS (state test), and not be accountable for Grade 3?”
If me and you could combine forces, we could draw up a 20-month plan to get to the finish line, instead of each drawing up an 8-month plan. Most likely, as the Grade 3 teacher, I’d solve math basics issues by “whatever means and dosage necessary” — during class, before and after school, Saturdays, summer — and you’d help, as the 4th grade teacher. We’d tag team it. We’d mobilize a bunch of volunteers. We’d thoughtfully deploy the 3rd graders who both know the basic stuff cold, and who like teaching other kids.
I wouldn’t get dinged on Grade 3 MCAS (kids could still sit for the test). Then I’m on the hook with you, 4th grade teacher — in 4th grade we’ve promised the kids, parents, and Commonwealth of Massachusetts that the kids will do well. Most of the “new stuff” (not all) would get covered in 4th grade, both the 3rd and 4th grade standards.
We could limit this “waiver” to teachers who have some established track record of success. And perhaps limit to those who’ve taught for many years: i.e., teachers who’ve taught at least 10 years are statistically very likely to stay another 20.
And obviously only those who believe in this particular approach to remediation, and who really like/respect the other teacher, would self-select to bind their fates together. It’d work really well with one teacher who enjoys solving the basic skill deficits, and another who enjoys explaining the new ideas. (I’d guess there are more of the latter than the former).
This might also free some teachers to work on simpler stuff with kids who need it. Sometimes teachers get dinged (wrongly) for doing that. Here’s commenter “Kevin” on the Core Knowledge Blog:
In my district there is a feedback form which administrators use when they walk through a classroom. The administrator attempts to identify and mark what level of Bloom’s the teacher is teaching to. It is considered superior to be at the higher levels, no matter what students are “creating” or “evaluating”. Creating a poster is far more superior than learning the fundamentals of linear equations, because students are “creating”, regardless of the content. Bloom’s taxonomy, along with others, has become so misinterpreted and misapplied that many teachers and administrators believe students can apply higher-order thinking without the necessary knowledge.
Here’s a related blog from the archives, about Jaime Escalante and teamwork.

Couple things:
I’ve convinced myself that 5th grade math is the key to the overall success of no excuses middle schools. If you start fresh with 5th graders and go all out, you have a fighting chance of hammering them through almost the whole elementary math in one year. It is a unique opportunity to maximize growth.
One would hope that newfangled adaptive tests would help with the issue of not showing growth among low-performing students.
Mike,
Do you guys use the MAP assessment or anything similar that measures growth? Besides the different state tests, almost all KIPP schools use MAP to measure growth from fall to spring and from spring to spring. Since the test is adaptive and uses the same scoring scale from K-12, you can see growth even from kids at the high and low end of the distribution. The test is far, far from perfect, but it really helps our teachers focus on moving every kid as far as possible each year and not simply the triaging you describe above.
Tom – good stuff as always.
Ben – varies by grade level. Example, STEP for early grades to measure literacy.
However, typically where there is a state test, school leaders and teachers feel pressure to maximize the state test, no?
I.e., if a KIPP teacher gets nice gains on MAP and kids flub MCAS, does that go over well?
And by contrast, if kids do well on MCAS but poorly on MAP, then I’m guessing it’s not considered a big deal….but let me know if I’m off here.
I think it’s interesting that at a lot of top performing charter schools (i.e. MATCH, Edward Brooke, Excel) the test scores already reflect the trends you would expect from a “tag team” approach- i.e. they are lower in fifth and sixth grade but climb steadily in 7th and 8th grade (or in 10th grade for the schools that have a high school). This could just be the effect of spending cumulative time in a good school, but I wonder if part of this has to do with the curricular approach at these schools (i.e. more remediation in the earlier grades).
Hey Mike,
I think you’ve hit on a huge problem. Striving to cover grade-level material for a state test undermines teaching math in a way that can be truly understood by students behind grade level. This year, I started with 4th grade material and am now essentially caught up to sixth grade level around Thanksgiving. Now, unfortunately, I can choose between skimming through grade level standards or not covering some (bye-bye, geometry).
It’s like all my blog reading is zeroing in on the same concept: http://kitchentablemath.blogspot.com/2012/11/two-years-is-two-years.html
Mike,
Great questions. As always, I think the answers vary a bit from KIPP region to region, but here are my general feeling…
Since KIPP is in 20 states and DC and state tests are really apples to oranges comparison at this point, MAP is actually what is most important for comparing schools in our network. Schools that do well on MAP are held up internally as the top performing schools and state tests don’t factor in at all. State tests obviously matter a ton too for other reasons, but there is a lot of emphasis and focus on MAP within KIPP.
In general, we’re finding that teachers who do well on moving kids on the state test also have good results on MAP. It’s in no way a perfect correlation, but we rarely see divergent results like in your example. At least in Philadelphia, we look at both state test results and MAP as equal data points for evaluating the success of our schools (and teachers) and would take either of your examples as a pretty serious basis for exploring what is going on with that teacher and course.
Instead of tying two teachers together, couldn’t individual teachers receive incentives for all future learning for each student?
Like a salesman earning “residuals” monthly for years into the future, a teacher who focuses on the basics in 3rd grade can get a fraction of the incentive for student success in 4th, 5th, 6th, etc. After 5 years, teachers would be receiving residuals on 100+ students spread out over many different grades/teachers to smooth incentives.
The name what you are talking about – an assessment’s ability to distinguish between performance across ability levels – is called the ‘Test Information Function’. Sometimes you can read about it in your state’s technical manual. Here’s a chapter of a book that talks about it: http://echo.edres.org:8080/irt/baker/chapter6.pdf
The NAEP runs into this, which is why they don’t spit back individual scores for students – if you gave the whole test to every kid, it would take 5 hours to administer!
http://www.nagb.org/assets/documents/publications/frameworks/tech2014-framework/ch_4/assessment.html
The hope, I think, is that modern assessments will adapt mid-session and provide better coverage across the ability spectrum. This is what MAP does. And to Ben’s point above, that’s why we give it so much weight in Newark – the growth estimates are really good for every kid, not just kids at/near the proficiency cutoff.