This blog is about 3 stories.

1. The start-up year for a very different sort of Graduate School of Education. It's a tiny subset of...
2. ...The much larger, national effort to transform teaching and teachers. That is a big subset of...
3. ...A multi-kajillion-dollar effort to improve the ludicrous odds (7% or so) of a poor kid ever getting a college diploma.

Nate Silver, 34 out of 50

Posted: November 8th, 2012 | Author: | | 10 Comments »

Andrew Mooney is a Harvard student. He wrote a great blog today:

For one small subcommunity of America, the man who benefited the most from the country’s decisions at the polls on Tuesday was not Barack Obama—it was Nate Silver, statistician and creator of the FiveThirtyEight blog on the New York Times’ website.

Based on current election returns, Silver correctly predicted the outcomes of all 50 states, with the result in Florida still pending. Given his track record—he got 49 out of 50 right in 2008—Silver appears to have ushered in a new level of credibility for statistical analysis in politics.

Not so fast, says Mooney.

But there may be a better way of evaluating Silver’s predictions than a binary right-wrong analysis….Using this methodology (examining margin of error), Silver’s record looks a lot less clean. The actual election results in 16 states fell outside the margin of error Silver allotted himself in his projections, reducing his total to 34-for-50, or 68 percent.

He was furthest off in Mississippi, which wasn’t nearly as lopsided as he predicted, and West Virginia, which voted more Republican than expected. Of course, Silver was still within two percent on 19 states, an impressive feat in itself.

Big Picture: Geeks Versus “Experts”

Moneyball is the mostly real-life story of Geek versus Expert. A baseball general manager (Billy Beane, played by Brad Pitt in the movie) relies on experts — tobacco-chewing scouts who’ve been in the game their whole lives, guys like Grady Fuson.

Beane hires a geek, a chubby recent Yale grad (Paul DePodesta, played by Jonah Hill). New decisions are made.

Experts get angry. They’ve often spent a lifetime acquiring experience. “Feel.” How dare the young whippersnapper challenge their author-it-ay? Beane fires Fuson.

(FYI. The firing was a Hollywood flourish. The real life story is here).

The A’s become a good team with ballplayers that experts didn’t value, hidden gems only revealed by numbers.

* * *

That story just played out again on Tuesday, on a different ballfield. Karl Rove, George Will, Dick Morris, Michael Barone, Peggy Noonan predicted a Romney landslide right before the election. They were spectacularly wrong.

Silver and a bunch of other geeks, like Mark Blumenthal at Pollster.com, and Sam Wang at Princeton, largely got the story right.

As I observed last week, one reason some experts err is they choose to live in a hyper-partisan bubble. They purposefully avoid data that tells a story they don’t want to hear.

All the geeks did is work with public polls that were easily available on the Internet. Yet conservative media often simply played up “Romney ahead” polls and didn’t mention “Obama ahead” polls.

* * *

K-12 has its own Nate Silvers. Two I know are Tom Kane and Roland Fryer. Those guys never were schoolteachers. They look at numbers. Then sometimes they say things that makes the experts in our field quite angry.

For example, Tom Kane showed that expert observers — whether principals or teachers, any veteran who watches a class and then fills out a scoring rubric — are not very good at predicting which teachers help kids to make the most growth. The eyeball test does not work well. Which is troubling remains most of what we do to evaluate teachers.

Meanwhile, sometimes data allows individuals to do their jobs better. Case in point: Ross Trudeau’s blog here.

Big Picture: Geeks are on the rise. They will continue to battle experts, but increasingly decision-makers will listen to geeks.

But beneath that Big Picture, there are 2 Big Caveats.

1. First, what Andrew Mooney writes about political geeks also applies to K-12 geeks:

The takeaway here is that, while Silver’s work the last four years has been impressive, he is not a mysterious wizard—for example, both the Huffington Post and Princeton’s Sam Wang had similarly accurate results. He is also not infallible, and he would be the first to admit it.

Forecasting is never an area where we should expect 100 percent accuracy, and though Silver’s work is bringing a lot of positive attention to statistical analysis in general, it’s important that people keep their expectations of its applications realistic.

2. The Bigger Caveat in my opinion: Silver, Bill James, et al…they got to where they are because they are the best geeks. Mooney’s point applies to the limits of the best quants.

There are many geeks that aren’t very good at their work, by any definition. They crunch numbers, but mess up some key thinking.

And I worry about that in the K-12 field. So-so and bad geeks proliferate.

* * *

To sum up:

1. Increasingly, K-12 decision-makers are open to “using numbers” and the geeks who crunch ‘em. They’re not reflexively anti-geek and pro-”expert.” Which is good. If they hire the best geeks, and so long as they understand the limits of geekery, kids will be better off.

2. However, often the K-12 decision-makers hire mediocre geeks, or worse. So the result is not Moneyball, not stats-based breakthroughs that help the kiddos learn, or that help teachers improve. Instead, nothing improves over the so-so or bad “expert” decision making which has long been in place. And sometimes things get worse.


10 Comments on “Nate Silver, 34 out of 50”

  1. 1: Tom Hoffman said at 2:19 pm on November 8th, 2012:

    Michael,

    You REALLY should read The Signal and the Noise, or at least skim most of it.

    Silver emphasizes that forecasters without expertise in their subject areas are rarely successful. If you don’t understand, intimately, the problem domain and have a real theory of how it works, you just oversimplify.

    The core of the book is a discussion of Bayesian analysis which seems to me to be extremely relevant to the teacher and school evaluation debates. Basically, the Bayesian approach is all about integrating a stream of individually noisy or error-prone data into a prediction that gradually becomes more certain. Like a sequence of individual polls, for example.

    I’ll refrain from trying to explain it in detail, but, for example, in teacher evaluation you might maintain an ongoing probability that each teacher was a Bad Teacher. As each observation, VAM score, student survey, whatever came it, it would modify that probability, based on the reliability of each type of data.

    It makes way, way more sense than the “average together 20% of this, 30% of that, etc., every year and assign an annual rating” systems that are being forced upon us now.

    He’s also very skeptical about “big data” as a panacea in general, which I gather a lot of the current ed tech hype is based on.

  2. 2: Michael Goldstein said at 3:53 pm on November 8th, 2012:

    Great comment Tom.

    You are remarkably well-read!

    I will add your rec to my list. I just finished the last book you recommended a month ago….

  3. 3: mathteacher said at 9:43 pm on November 8th, 2012:

    Silver could make prediction about the test scores of schools in coming years to make it easier for parents to make choices, at least along those lines.

    Which brings me to a somewhat tangential point: right now, Boston Public is trying to figure out how to redo their student distribution system. The initial proposals seemed to be designed mainly to reduce the amount of time kids sit on the bus. Newer proposals, thankfully, seem more focused on equity in terms of access to the best schools. However, it really bugs me that people are looking purely at overall test results (either proficiency or SGP). I’d bet Silver would recognize that for most schools, A+P is correlated pretty closely by the percentage of non-low-income kids in the school. In his terms, this is bias. Unfortunately, everyone talks about schools with fewer low income schools and (marginally) higher test scores as the great schools in the district. Isn’t this like saying that a poll that has Romney winning the election and is biased towards Republicans is a good predictor of reality? What I think we should be looking at are the schools that break away from the best fit line and show better results with low-income kids. Those should be the model schools. I wish I could be more coherent about this, but in the end, I think most people who look at school results are not recognizing the inherent bias in the polling numbers…

  4. 4: Andrew said at 11:17 pm on November 8th, 2012:

    Not that it negates the point that you are making here (which I agree with), but to defend my guy’s reputation – Andrew Mooney has some egg on his face. Check his update:
    “assuming that his projected margin of error figures represent 95 percent confidence intervals, which it is likely they did, Silver performed just about exactly as well as he would expect to over 50 trials. Wizard, indeed.”

    Nate actually did a phenomenal job quantifying the uncertainty in his forecast – 68% of outcomes inside of 1 standard error, and 48/50 (96%) within 2. Wizard indeed – he hit the 68/95/99 rule dead on (http://en.wikipedia.org/wiki/68-95-99.7_rule).

    Getting 50/50 states right is somewhat superficial – it requires any 51/49 coin flip events (eg, Florida) to break your way. Quantifying the uncertainty in the forecast, though – that’s a hell of an accomplishment.

  5. 5: Scott Seider said at 9:40 am on November 9th, 2012:

    This may be an obvious point, but, in forecasting the election or trying to improve a baseball team, everyone (experts and geeks alike) is on the same page about what the outcome variable is– the presidential winner, in one case, and ‘wins’ in the other. In contrast, geeks and experts in education are still, to some extent, arguing about what the right outcome variables are.

  6. 6: Dai said at 10:10 am on November 9th, 2012:

    Great post. We need more (top-tier) geeks looking at higher ed productivity & outcomes!

  7. 7: Michael Goldstein said at 10:51 am on November 9th, 2012:

    Thanks Andrew for the update! My brother Steve emailed me too, thanks bro.

    Scott:

    Yep. But the coin of the realm is state test score growth, no?

    Even many who disagree with growth scores use growth scores to complain about policies they disagree with (example, anti-test folks using test scores to show — correctly — that the average charter is not so hot).

    Broader agreement from people that college completion and labor market outcomes matter more than test scores, but those have historically been hard to know.

    High school completion, too.

  8. 8: Michael Bower said at 9:01 pm on November 11th, 2012:

    Great Post. If you have the time, you should check out the Sports Analytics Conference they have each year at MIT (The business school (Sloan) runs it). It is a conference based on the rise and importance of analytics in sports. Bill James came to it last year. It is a celebration of geeks. There is a really cool research paper competition portion of the conference where students present new analytical ideas and practices. Maybe some cross learning possibilities as well…

  9. 9: Vivek Rao said at 4:58 pm on December 19th, 2012:

    Getting caught up on the blog, Mike…enjoying this and other posts. What I have taken from Moneyball, the parts of Signal and the Noise that I’ve read so far, and the most recent lengthy article about Daryl Morey of the Rockets (in SI, I think) is this (and it’s not really a revolutionary point as much as an intuitive point that’s important to remember): Data-crunching and statistics, if used properly, can be incredibly valuable, but the value of any given model or projection or statistic depends heavily on whether the right/important variables are incorporated and whether the wrong/unimportant variables are screened out and controlled for. And that’s where some of the more five-tools, baseball-scout insight can be useful. We want number crunchers developing ways of measuring teacher performance, but I think we also want the Charlie Sposatos of the world (i.e., master educators/scouts with a deeper understanding of some of the less tangible signs that a teacher is teaching well or a student is learning a lot) to trust the value of the math and to think about what exactly existing models/data-tracking are capturing or failing to capture. And vice versa. (For example, a baseball scout can tell you if a player’s seven walks in ten games was the product of a good batting eye or terrible oppposing pitching, but before wouldn’t have bothered to care. It’s fascinating to think how the value of classroom observation might be truly unlocked by a better understanding of education data.) It seems like in baseball, basketball, and politics, the timeline is (1) geeks don’t exist, (2) geeks exist but use only the bluntest, crudest tools (RBIs, points per game, national polls) and therefore don’t threaten experts/scouts, (3) geeks hone their tools and piss off the experts/scouts, who demean/minimize the importance of geeks, (4) geeks get really good results and scare everyone else into buying into geekery to an extent, (4.5, maybe) people buy into geekery too much, or at least too blindly, failing to separate out the noise or understand what the signal means, and (5, hopefully) a relatively harmonious balance develops, as geekery is informed by expertise/scouting and expertise/scouting is informed by geekery. I’m certainly curious to see how this plays out in the education sphere. You know a lot more about this than I do, but from my limited knowledge, it seems like we’re still in the “experts demean and minimize the importance of geeks” phase; perhaps, though, geekery gaining broader acceptance in a number of other high-profile, traditionally geek-averse, expert-dominated fields

  10. 10: Vivek Rao said at 5:02 pm on December 19th, 2012:

    [oops, didn't finish]

    …perhaps, though, geekery gaining broader acceptance in a number of other high-profile, traditionally geek-averse, expert-dominated fields will accellerate the timeline in the education world. I’m probably being a tad bit too optimistic, though.


Leave a Reply