Wishful thinking was only a minor factor in the massive, obvious, embarrassing error by conservative pundits who predicted that the 2012 presidential election would be a dead heat or even a Mitt Romney landslide. A profound misunderstanding of statistical distributions caused the humiliation of so many conservative bloggers, journalists, and campaign managers.
But before anyone sneers, thousands of managers trying to interpret analytics make that same mistake every day: confusing numbers with distributions.
Numbers report single facts (prices, distances, times), but distributions are sets of numbers, often expressed as graphs, describing situations (odds, possibilities, densities). Joe Scarborough, David Brooks, and dozens of other political journalists argued that, since polls leaned toward Barack Obama by only 1 to 2 percent (a number), statistical forecasts (i.e., distributions) predicting a better than 80 percent chance of an Obama victory couldn't be right. It had to be a dead heat. Unfortunately for the conservative pundits, directly comparing the numbers from raw data and the statistics describing a distribution is as meaningless as the famous score in Calvinball: 12 to Q.
When we say a variable has a probability distribution, what we mean is that there is a probability associated with every possible value of the variable. For example, if the variable is "total heads after four flips of a coin" the values can only be 0, 1, 2, 3, or 4, with this distribution:
Although individual voters don't flip a coin, the math is the same: equal numbers of two possibilities in random order. Imagine the state of Normalia, which has exactly 1 million voters -- 500,000 supporters each for Obama and Romney. Here's the Normalia distribution:
The single most likely outcome is a tie, and combined probabilities are equal on each side of the dashed red tie line. If all states (plus the District of Columbia) were Normalia, and Electoral College votes were distributed among them as evenly as possible (28 states with 11 and 23 states with 10), the distribution for Electoral College votes would have looked like this:
Since the national popular vote felt close to 50-50 (the ratio was actually about 101-96), pundits of limited numeracy pictured a distribution like that of the United States of Normalia. This is par for the course among many managers. I've attended countless research result presentations in which the apparent smallness of the differences made hands begin to wave, dismissing the real, grainy, local lumpiness and eager to get on with applying intuition and experience.
Just remember that distributions with a central spike overreact. A small change in individual preferences shrinks one tail, fattens and lengthens the other, and moves that central peak toward the fatter tail, while the majority line stays in the same place. (This is what statisticians call skew, and it refers to descriptive geometry, not liberal conspiracy.) Obama's advantage of about 2.5 percent would move his chances of getting a majority in Normalia from 50-50 to 53.9-46.1. Furthermore, repeated application of distributions is nonlinear. A 53.9 percent chance of a majority, applied across 51 Normalias, becomes a 60.1 percent chance of a majority in that imaginary, all-things-even Electoral College.
But none of the states was Normalia, and nothing was even. The lumpiness of the real world meant that Obama started with 237 electoral votes in the bag to Romney's 191; only the nine swing states and their 110 electoral votes were in dispute. Polls from a generally conservative source, just before the election, showed pro-Obama skews in seven swing states and pro-Romney skews in two.
Based on that data, the real distribution looked something like this:
And in that lumpy, real distribution, a 2.5 percent advantage in individual preferences equates to an 84.24 percent chance of an Electoral College win.
The real-world Electoral College graph is amazingly different from life in Normalia. Here's what you should do as you move toward a greater reliance on metrics.
Remember that what comes out of a distribution may bear little resemblance to the microdecisions it comprises.
Know the shape of the distribution.
Analyze to see where your goals fall on the distribution.
Be especially careful around successive distribution problems (like the translation of popular vote to electoral vote, the adoption of a tech standard across several platforms, or multiple wholesale/retail connections), because tiny differences can blow up fast.
Know the ground; most of what happens, happens locally and stochastically. (Globally, the average adult human has one testicle and one ovary, but locally, hardly anyone has met someone like that, except maybe at the Romney victory party.)
Louis, I'd say they saw the ground but not the implications. Kind of like recognizing that "well it's a high scoring game, we are only down by a touchdown and a field goal, and there is still five minutes on the clock" but then not getting to the conclusion "we have to play it out but we are almost certain to lose." Not so much ignoring the facts -- the polls were very accurate this time -- but refusing to see what the facts meant, and instead insisting on just repeating whichever facts made you happiest.
I see what you are saying John, so is it safe to say republican pollsters simply did not understand what the underlying meaning of the distribution was in reality ? And if this was the case, how can seasoned campaign managers make such a colossal blunder ? Objectivity lost to partisan politics ? Well, of course it was.
But I think it goes to what you and @rbaz were discussing earlier in this thread, the fact that the media has skew reality to such a degree, couple that with a Media pool that is at best passive and non-confrontational produces outcomes such as this past election. Am I the only one who thought this (the election) was over by half-time ?
I have always held a heathly disdain for polls (especially national elections) because they tend to repeat themselves in flow ( meaning regardless of all the other polls before the one just before voting will most often be deemed a "close race"). I have yet to see one in my lifetime where this pattern veered too far from this formula, which is a major reason I have no use for polls. As far as I am concerned yet again Polls and Polling did not reflect what is really going on " on the ground'.
I just can't believe this simple fact was missed by many so called experts.
Louis, well, if you understand the thing being represented, that's a pretty good guard against many kinds of folly. And my purpose here is not to teach people how to do the math. There isn't space, time, or interest for that here. The idea is more to get people comfortable with asking for the math and having an idea of what it says when they get it. Kind of like the wine columnist doesn't teach you how to make wine, but what to order when and what to look for.
Lyndon, I think Krugman did a pretty solid job of explaining too. Another way to look at distributions is to think of them as functions that convert local and specific margins into overall probabilities. But a key point not to be lost is that distributions also apply to forecasting markets, liability, crime, war, sports, any large scale wide participation human activity. I guarantee that someone who is chuckling "silly Republicans" right now will make the same mathematical error themselves within a day. (I hope to reduce the number but I don't think it can be eliminated).
Thanks John for explaining in part what happen to republican pollsters with regard to understanding or the lack thereof with respect to distributions. The method of analysis seems easy enough however many make this kind of mistake whenever this tool is in use.
I am not sure I understand it completely either, but I take pride in practicing your 5th tip - Knowing the ground. This alone can make up for numerous statistical shortcomings IMO.
John Barnes writes Wishful thinking was only a minor factor in the massive, obvious, embarrassing error by conservative pundits who predicted that the 2012 presidential election would be a dead heat or even a Mitt Romney landslide. A profound misunderstanding of statistical distributions caused the humiliation of so many conservative bloggers, journalists, and campaign managers.
In a sense, the profound failure of GOP election prediction reflects a case of getting caught by their own petard. Carl Rove's vehement disbelief, witnessed by millions on live TV when Fox News analysts called Ohio for Obama, is iconic, and it seems to reflect a situation of believing the fantasies in the whacko reality you have constructed and led others into.
In another sense, the GOP prediction failure represents a failure of a kind of a 21st-century Inquisition. The GOP targeted venomous anger against both polls and analysts who dared to use math objectively and read the results that suggested a rather solid Obama victory. This level of disbelief and rejection of science (math) reminds me of the pressure brought to bear on Galileo, forcing him to deny what his own scientific research and observations were telling him. Fortunately, for this election, the rightwing Inquisition simply fizzled.
Nate Silver of the NYT's 538 blog, a platform mainly for the presentation of the results of his own political analytics, has been widely hailed for the accuracy of his math-based predictions. For example, see:
Here are some interesting quotes: Silver came through with flying colors, as Obama performed nearly exactly the way he said he would. The public recognition was immediate.
"You know who won the election tonight? Nate Silver," Rachel Maddow said on MSNBC. Even Fox News tipped its cap to Silver.
Others said that the results could force a bit of a sea change in political journalism.
"What does this victory mean?" Mashable's Chris Taylor wrote. "That mathematical models can no longer be derided by "gut-feeling" pundits. That Silver's contention -- TV pundits are generally no more accurate than a coin toss -- must now be given wider credence."
Silver, of course, became a particularly hated target of the rightwing anti-science blitz that attempted to portray some kind of mysterious Romney "surge" till the bitter end.
Economist and NYT columnist Paul Krugman discussed much of this (somewhat along the lines of John Barnes's explanation) in a Nov. 4th blog entry:
Some of Krugman's interesting points: First of all, from what I can see a lot of people have trouble with the distinction between probabilities and vote margins. ...
Second, people clearly have a problem with randomness — with the fact that any poll, no matter how carefully conducted, has a margin of error. (And the true margins of error are surely larger than the statistical measure always reported, since sampling error isn't the only way a poll can go wrong). ...
What this means is that if you look at all the polls, you're very likely to find one or two that tell you what you want to hear... even good pollsters will produce an occasional off result, and you really, really don't want to start picking and choosing those off results to make yourself feel good.
...Oh, and a third point: those margins of error are for any one poll. An average of many polls will have a much smaller standard error.
Seth, accuracy wasn't really an issue here; it's just that when you have successive close-numbers events and one side needs fewer wins than the other, the side that needs fewer wins has a massive advantage. As the IRA communicated to the Queen after a failed assassination attempt, "You have to be lucky every time. We only have to be lucky once."
I saw the articles explaining an 80% chance of winning. It always amazes me how just a couple of percentage points here and there can cause major events to go in one direction. One state polls may have a large margin of error, but the margin of error is much reduced, however, when you aggregate different polls together, since that creates a much larger sample size.
LEADERS FROM THE BUSINESS AND IT COMMUNITIES DUEL OVER CRITICAL TECHNOLOGY ISSUES
The Current Discussion
Visual Analytics: Who Carries the Onus? The Issue: Data visualization is an up-and-coming technology for businesses that want to deliver analytical results in a visual way, enabling analysts the ability to spot patterns more easily and business users to absorb the insight at a glance and better understand what questions to ask of the data. But does it make more sense to train everybody to handle the visualization mandate or bring on visualization expertise? Our experts are divided on the question. The Speakers: Hyoun Park, Principal Analyst, Nucleus Research; Jonathan Schwabish, US Economist & Data Visualizer
To save this item to your list of favorite AllAnalytics content so you can find it later in your Profile page, click the "Save It" button next to the item.
If you found this interesting or useful, please use the links to the services below to share it with other readers. You will need a free account with each service to share an item via that service.
Dynamic data visualizations let analysts and business users interact with the data, changing variables or drilling down into data points, and see results in a flash. Advance your use of data visualization with tools that support features like auto-charting, explanatory pop-ups, and mobile sharing.
No doubt your enterprise is amassing loads of data for fact-based decision-making. Hand in hand with all that data comes big computational requirements. Can traditional IT infrastructure handle the increasing number and complexity of your analytical work? Probably not, which is why you need a backend rethink. Big data calls for a high-performance analytics infrastructure, as Fern Halper, a partner at the IT consulting and research firm, Hurwitz & Associates, discusses here.
Redbox's bright-red DVD kiosks are all but ubiquitous these days, located in more than 28,000 spots across the country. Jayson Tipp, Redbox VP of Analytics and CRM, provides an insider's look at how the company has accomplished its phenomenal nine-year growth.
InterContinental Hotels Group (IHG), a seven-brand global hotelier, has woven analytics into the fabric of its operations. David Schmitt, director of performance strategy and planning, shares IHG's analytics story and his lessons learned.
Elizabeth Barth-Thacker, a BI and informatics technology manager at Humana, tells us how her team is creating data transparency and building engagement with the business – with the help of an internal collaboration portal called Humanalytics.
Speaking at SAS Global Forum Executive Conference, Rajeev Kaul, SVP of pricing at OfficeMax, uses a Chinese proverb to explain one of the reasons he's deploying SAS Visual Analytics.
In an All Analytics interview, Mike Cavaretta, technical leader, predictive analytics at Ford Research & Advanced Engineering, shares how big-data is fueling vehicle decisions.
Analytics professionals and SAS executives share how organizations can get on with their work so much faster when working in a high-performance and visual analytics environment.
Analytics professionals who attended SAS's recent Executive Briefing in New York share how they think visual analytics might help their organizations get better value from data.
At Boeing, effective decision making comes down to this simple formula: QxA=E, as executive Jerry Allyne explained at the recent INFORMS analytics conference.
Whether working in major league sports, financial services, or healthcare, analytics, and data, professionals are checking out how visual analytics and high-performance technologies can help them optimize their environments, shrink their cycle times, and improve decision making, as attendees at the recent SAS Executive Briefing in New York share with us.
SAS CEO Jim Goodnight speaks with us at a recent SAS Executive Briefing about getting a feel for what's in your big-data and other new realities powered by advanced analytics.
Jim Davis, SVP and CMO at SAS, talks with us at a recent SAS Executive Briefing about how high-performance analytics and visual analytics take away the concerns over big-data and let companies get down to business with their data.