Wishful thinking was only a minor factor in the massive, obvious, embarrassing error by conservative pundits who predicted that the 2012 presidential election would be a dead heat or even a Mitt Romney landslide. A profound misunderstanding of statistical distributions caused the humiliation of so many conservative bloggers, journalists, and campaign managers.
But before anyone sneers, thousands of managers trying to interpret analytics make that same mistake every day: confusing numbers with distributions.
Numbers report single facts (prices, distances, times), but distributions are sets of numbers, often expressed as graphs, describing situations (odds, possibilities, densities). Joe Scarborough, David Brooks, and dozens of other political journalists argued that, since polls leaned toward Barack Obama by only 1 to 2 percent (a number), statistical forecasts (i.e., distributions) predicting a better than 80 percent chance of an Obama victory couldn't be right. It had to be a dead heat. Unfortunately for the conservative pundits, directly comparing the numbers from raw data and the statistics describing a distribution is as meaningless as the famous score in Calvinball: 12 to Q.
When we say a variable has a probability distribution, what we mean is that there is a probability associated with every possible value of the variable. For example, if the variable is "total heads after four flips of a coin" the values can only be 0, 1, 2, 3, or 4, with this distribution:
Although individual voters don't flip a coin, the math is the same: equal numbers of two possibilities in random order. Imagine the state of Normalia, which has exactly 1 million voters -- 500,000 supporters each for Obama and Romney. Here's the Normalia distribution:
The single most likely outcome is a tie, and combined probabilities are equal on each side of the dashed red tie line. If all states (plus the District of Columbia) were Normalia, and Electoral College votes were distributed among them as evenly as possible (28 states with 11 and 23 states with 10), the distribution for Electoral College votes would have looked like this:
Since the national popular vote felt close to 50-50 (the ratio was actually about 101-96), pundits of limited numeracy pictured a distribution like that of the United States of Normalia. This is par for the course among many managers. I've attended countless research result presentations in which the apparent smallness of the differences made hands begin to wave, dismissing the real, grainy, local lumpiness and eager to get on with applying intuition and experience.
Just remember that distributions with a central spike overreact. A small change in individual preferences shrinks one tail, fattens and lengthens the other, and moves that central peak toward the fatter tail, while the majority line stays in the same place. (This is what statisticians call skew, and it refers to descriptive geometry, not liberal conspiracy.) Obama's advantage of about 2.5 percent would move his chances of getting a majority in Normalia from 50-50 to 53.9-46.1. Furthermore, repeated application of distributions is nonlinear. A 53.9 percent chance of a majority, applied across 51 Normalias, becomes a 60.1 percent chance of a majority in that imaginary, all-things-even Electoral College.
But none of the states was Normalia, and nothing was even. The lumpiness of the real world meant that Obama started with 237 electoral votes in the bag to Romney's 191; only the nine swing states and their 110 electoral votes were in dispute. Polls from a generally conservative source, just before the election, showed pro-Obama skews in seven swing states and pro-Romney skews in two.
Based on that data, the real distribution looked something like this:
And in that lumpy, real distribution, a 2.5 percent advantage in individual preferences equates to an 84.24 percent chance of an Electoral College win.
The real-world Electoral College graph is amazingly different from life in Normalia. Here's what you should do as you move toward a greater reliance on metrics.
- Remember that what comes out of a distribution may bear little resemblance to the microdecisions it comprises.
- Know the shape of the distribution.
- Analyze to see where your goals fall on the distribution.
- Be especially careful around successive distribution problems (like the translation of popular vote to electoral vote, the adoption of a tech standard across several platforms, or multiple wholesale/retail connections), because tiny differences can blow up fast.
- Know the ground; most of what happens, happens locally and stochastically. (Globally, the average adult human has one testicle and one ovary, but locally, hardly anyone has met someone like that, except maybe at the Romney victory party.)