Wishful thinking was only a minor factor in the massive, obvious, embarrassing error by conservative pundits who predicted that the 2012 presidential election would be a dead heat or even a Mitt Romney landslide. A profound misunderstanding of statistical distributions caused the humiliation of so many conservative bloggers, journalists, and campaign managers.

But before anyone sneers, thousands of managers trying to interpret analytics make that same mistake every day: confusing numbers with distributions.

Numbers report single facts (prices, distances, times), but distributions are sets of numbers, often expressed as graphs, describing situations (odds, possibilities, densities). Joe Scarborough, David Brooks, and dozens of other political journalists argued that, since polls leaned toward Barack Obama by only 1 to 2 percent (a number), statistical forecasts (i.e., distributions) predicting a better than 80 percent chance of an Obama victory couldn't be right. It had to be a dead heat. Unfortunately for the conservative pundits, directly comparing the numbers from raw data and the statistics describing a distribution is as meaningless as the famous score in Calvinball: 12 to Q.

When we say a variable has a probability distribution, what we mean is that there is a probability associated with every possible value of the variable. For example, if the variable is "total heads after four flips of a coin" the values can only be 0, 1, 2, 3, or 4, with this distribution:

Although individual voters don't flip a coin, the math is the same: equal numbers of two possibilities in random order. Imagine the state of Normalia, which has exactly 1 million voters -- 500,000 supporters each for Obama and Romney. Here's the Normalia distribution:

The single most likely outcome is a tie, and combined probabilities are equal on each side of the dashed red tie line. If all states (plus the District of Columbia) were Normalia, and Electoral College votes were distributed among them as evenly as possible (28 states with 11 and 23 states with 10), the distribution for Electoral College votes would have looked like this:

Since the national popular vote felt close to 50-50 (the ratio was actually about 101-96), pundits of limited numeracy pictured a distribution like that of the United States of Normalia. This is par for the course among many managers. I've attended countless research result presentations in which the apparent smallness of the differences made hands begin to wave, dismissing the real, grainy, local lumpiness and eager to get on with applying intuition and experience.

Just remember that distributions with a central spike overreact. A small change in individual preferences shrinks one tail, fattens and lengthens the other, and moves that central peak toward the fatter tail, while the majority line stays in the same place. (This is what statisticians call skew, and it refers to descriptive geometry, not liberal conspiracy.) Obama's advantage of about 2.5 percent would move his chances of getting a majority in Normalia from 50-50 to 53.9-46.1. Furthermore, repeated application of distributions is nonlinear. A 53.9 percent chance of a majority, applied across 51 Normalias, becomes a 60.1 percent chance of a majority in that imaginary, all-things-even Electoral College.

But none of the states was Normalia, and nothing was even. The lumpiness of the real world meant that Obama started with 237 electoral votes in the bag to Romney's 191; only the nine swing states and their 110 electoral votes were in dispute. Polls from a generally conservative source, just before the election, showed pro-Obama skews in seven swing states and pro-Romney skews in two.

Based on that data, the real distribution looked something like this:

And in that lumpy, real distribution, a 2.5 percent advantage in individual preferences equates to an 84.24 percent chance of an Electoral College win.

The real-world Electoral College graph is amazingly different from life in Normalia. Here's what you should do as you move toward a greater reliance on metrics.

Remember that what comes out of a distribution may bear little resemblance to the microdecisions it comprises.

Know the shape of the distribution.

Analyze to see where your goals fall on the distribution.

Be especially careful around successive distribution problems (like the translation of popular vote to electoral vote, the adoption of a tech standard across several platforms, or multiple wholesale/retail connections), because tiny differences can blow up fast.

Know the ground; most of what happens, happens locally and stochastically. (Globally, the average adult human has one testicle and one ovary, but locally, hardly anyone has met someone like that, except maybe at the Romney victory party.)

Louis, I'd say they saw the ground but not the implications. Kind of like recognizing that "well it's a high scoring game, we are only down by a touchdown and a field goal, and there is still five minutes on the clock" but then not getting to the conclusion "we have to play it out but we are almost certain to lose." Not so much ignoring the facts -- the polls were very accurate this time -- but refusing to see what the facts meant, and instead insisting on just repeating whichever facts made you happiest.

I see what you are saying John, so is it safe to say republican pollsters simply did not understand what the underlying meaning of the distribution was in reality ? And if this was the case, how can seasoned campaign managers make such a colossal blunder ? Objectivity lost to partisan politics ? Well, of course it was.

But I think it goes to what you and @rbaz were discussing earlier in this thread, the fact that the media has skew reality to such a degree, couple that with a Media pool that is at best passive and non-confrontational produces outcomes such as this past election. Am I the only one who thought this (the election) was over by half-time ?

I have always held a heathly disdain for polls (especially national elections) because they tend to repeat themselves in flow ( meaning regardless of all the other polls before the one just before voting will most often be deemed a "close race"). I have yet to see one in my lifetime where this pattern veered too far from this formula, which is a major reason I have no use for polls. As far as I am concerned yet again Polls and Polling did not reflect what is really going on " on the ground'.

I just can't believe this simple fact was missed by many so called experts.

Louis, well, if you understand the thing being represented, that's a pretty good guard against many kinds of folly. And my purpose here is not to teach people how to do the math. There isn't space, time, or interest for that here. The idea is more to get people comfortable with asking for the math and having an idea of what it says when they get it. Kind of like the wine columnist doesn't teach you how to make wine, but what to order when and what to look for.

Lyndon, I think Krugman did a pretty solid job of explaining too. Another way to look at distributions is to think of them as functions that convert local and specific margins into overall probabilities. But a key point not to be lost is that distributions also apply to forecasting markets, liability, crime, war, sports, any large scale wide participation human activity. I guarantee that someone who is chuckling "silly Republicans" right now will make the same mathematical error themselves within a day. (I hope to reduce the number but I don't think it can be eliminated).

Thanks John for explaining in part what happen to republican pollsters with regard to understanding or the lack thereof with respect to distributions. The method of analysis seems easy enough however many make this kind of mistake whenever this tool is in use.

I am not sure I understand it completely either, but I take pride in practicing your 5th tip - Knowing the ground. This alone can make up for numerous statistical shortcomings IMO.

John Barnes writes Wishful thinking was only a minor factor in the massive, obvious, embarrassing error by conservative pundits who predicted that the 2012 presidential election would be a dead heat or even a Mitt Romney landslide. A profound misunderstanding of statistical distributions caused the humiliation of so many conservative bloggers, journalists, and campaign managers.

In a sense, the profound failure of GOP election prediction reflects a case of getting caught by their own petard. Carl Rove's vehement disbelief, witnessed by millions on live TV when Fox News analysts called Ohio for Obama, is iconic, and it seems to reflect a situation of believing the fantasies in the whacko reality you have constructed and led others into.

In another sense, the GOP prediction failure represents a failure of a kind of a 21st-century Inquisition. The GOP targeted venomous anger against both polls and analysts who dared to use math objectively and read the results that suggested a rather solid Obama victory. This level of disbelief and rejection of science (math) reminds me of the pressure brought to bear on Galileo, forcing him to deny what his own scientific research and observations were telling him. Fortunately, for this election, the rightwing Inquisition simply fizzled.

Nate Silver of the NYT's 538 blog, a platform mainly for the presentation of the results of his own political analytics, has been widely hailed for the accuracy of his math-based predictions. For example, see:

Here are some interesting quotes: Silver came through with flying colors, as Obama performed nearly exactly the way he said he would. The public recognition was immediate.

"You know who won the election tonight? Nate Silver," Rachel Maddow said on MSNBC. Even Fox News tipped its cap to Silver.

Others said that the results could force a bit of a sea change in political journalism.

"What does this victory mean?" Mashable's Chris Taylor wrote. "That mathematical models can no longer be derided by "gut-feeling" pundits. That Silver's contention -- TV pundits are generally no more accurate than a coin toss -- must now be given wider credence."

Silver, of course, became a particularly hated target of the rightwing anti-science blitz that attempted to portray some kind of mysterious Romney "surge" till the bitter end.

Economist and NYT columnist Paul Krugman discussed much of this (somewhat along the lines of John Barnes's explanation) in a Nov. 4th blog entry:

Some of Krugman's interesting points: First of all, from what I can see a lot of people have trouble with the distinction between probabilities and vote margins. ...

Second, people clearly have a problem with randomness — with the fact that any poll, no matter how carefully conducted, has a margin of error. (And the true margins of error are surely larger than the statistical measure always reported, since sampling error isn't the only way a poll can go wrong). ...

What this means is that if you look at all the polls, you're very likely to find one or two that tell you what you want to hear... even good pollsters will produce an occasional off result, and you really, really don't want to start picking and choosing those off results to make yourself feel good.

...Oh, and a third point: those margins of error are for any one poll. An average of many polls will have a much smaller standard error.

Seth, accuracy wasn't really an issue here; it's just that when you have successive close-numbers events and one side needs fewer wins than the other, the side that needs fewer wins has a massive advantage. As the IRA communicated to the Queen after a failed assassination attempt, "You have to be lucky every time. We only have to be lucky once."

I saw the articles explaining an 80% chance of winning. It always amazes me how just a couple of percentage points here and there can cause major events to go in one direction. One state polls may have a large margin of error, but the margin of error is much reduced, however, when you aggregate different polls together, since that creates a much larger sample size.

To save this item to your list of favorite AllAnalytics content so you can find it later in your Profile page, click the "Save It" button next to the item.

If you found this interesting or useful, please use the links to the services below to share it with other readers. You will need a free account with each service to share an item via that service.