Wishful thinking was only a minor factor in the massive, obvious, embarrassing error by conservative pundits who predicted that the 2012 presidential election would be a dead heat or even a Mitt Romney landslide. A profound misunderstanding of statistical distributions caused the humiliation of so many conservative bloggers, journalists, and campaign managers.
But before anyone sneers, thousands of managers trying to interpret analytics make that same mistake every day: confusing numbers with distributions.
Numbers report single facts (prices, distances, times), but distributions are sets of numbers, often expressed as graphs, describing situations (odds, possibilities, densities). Joe Scarborough, David Brooks, and dozens of other political journalists argued that, since polls leaned toward Barack Obama by only 1 to 2 percent (a number), statistical forecasts (i.e., distributions) predicting a better than 80 percent chance of an Obama victory couldn't be right. It had to be a dead heat. Unfortunately for the conservative pundits, directly comparing the numbers from raw data and the statistics describing a distribution is as meaningless as the famous score in Calvinball: 12 to Q.
When we say a variable has a probability distribution, what we mean is that there is a probability associated with every possible value of the variable. For example, if the variable is "total heads after four flips of a coin" the values can only be 0, 1, 2, 3, or 4, with this distribution:
Although individual voters don't flip a coin, the math is the same: equal numbers of two possibilities in random order. Imagine the state of Normalia, which has exactly 1 million voters -- 500,000 supporters each for Obama and Romney. Here's the Normalia distribution:
The single most likely outcome is a tie, and combined probabilities are equal on each side of the dashed red tie line. If all states (plus the District of Columbia) were Normalia, and Electoral College votes were distributed among them as evenly as possible (28 states with 11 and 23 states with 10), the distribution for Electoral College votes would have looked like this:
Since the national popular vote felt close to 50-50 (the ratio was actually about 101-96), pundits of limited numeracy pictured a distribution like that of the United States of Normalia. This is par for the course among many managers. I've attended countless research result presentations in which the apparent smallness of the differences made hands begin to wave, dismissing the real, grainy, local lumpiness and eager to get on with applying intuition and experience.
Just remember that distributions with a central spike overreact. A small change in individual preferences shrinks one tail, fattens and lengthens the other, and moves that central peak toward the fatter tail, while the majority line stays in the same place. (This is what statisticians call skew, and it refers to descriptive geometry, not liberal conspiracy.) Obama's advantage of about 2.5 percent would move his chances of getting a majority in Normalia from 50-50 to 53.9-46.1. Furthermore, repeated application of distributions is nonlinear. A 53.9 percent chance of a majority, applied across 51 Normalias, becomes a 60.1 percent chance of a majority in that imaginary, all-things-even Electoral College.
But none of the states was Normalia, and nothing was even. The lumpiness of the real world meant that Obama started with 237 electoral votes in the bag to Romney's 191; only the nine swing states and their 110 electoral votes were in dispute. Polls from a generally conservative source, just before the election, showed pro-Obama skews in seven swing states and pro-Romney skews in two.
Based on that data, the real distribution looked something like this:
And in that lumpy, real distribution, a 2.5 percent advantage in individual preferences equates to an 84.24 percent chance of an Electoral College win.
The real-world Electoral College graph is amazingly different from life in Normalia. Here's what you should do as you move toward a greater reliance on metrics.
Remember that what comes out of a distribution may bear little resemblance to the microdecisions it comprises.
Know the shape of the distribution.
Analyze to see where your goals fall on the distribution.
Be especially careful around successive distribution problems (like the translation of popular vote to electoral vote, the adoption of a tech standard across several platforms, or multiple wholesale/retail connections), because tiny differences can blow up fast.
Know the ground; most of what happens, happens locally and stochastically. (Globally, the average adult human has one testicle and one ovary, but locally, hardly anyone has met someone like that, except maybe at the Romney victory party.)
Louis, I'd say they saw the ground but not the implications. Kind of like recognizing that "well it's a high scoring game, we are only down by a touchdown and a field goal, and there is still five minutes on the clock" but then not getting to the conclusion "we have to play it out but we are almost certain to lose." Not so much ignoring the facts -- the polls were very accurate this time -- but refusing to see what the facts meant, and instead insisting on just repeating whichever facts made you happiest.
I see what you are saying John, so is it safe to say republican pollsters simply did not understand what the underlying meaning of the distribution was in reality ? And if this was the case, how can seasoned campaign managers make such a colossal blunder ? Objectivity lost to partisan politics ? Well, of course it was.
But I think it goes to what you and @rbaz were discussing earlier in this thread, the fact that the media has skew reality to such a degree, couple that with a Media pool that is at best passive and non-confrontational produces outcomes such as this past election. Am I the only one who thought this (the election) was over by half-time ?
I have always held a heathly disdain for polls (especially national elections) because they tend to repeat themselves in flow ( meaning regardless of all the other polls before the one just before voting will most often be deemed a "close race"). I have yet to see one in my lifetime where this pattern veered too far from this formula, which is a major reason I have no use for polls. As far as I am concerned yet again Polls and Polling did not reflect what is really going on " on the ground'.
I just can't believe this simple fact was missed by many so called experts.
Louis, well, if you understand the thing being represented, that's a pretty good guard against many kinds of folly. And my purpose here is not to teach people how to do the math. There isn't space, time, or interest for that here. The idea is more to get people comfortable with asking for the math and having an idea of what it says when they get it. Kind of like the wine columnist doesn't teach you how to make wine, but what to order when and what to look for.
Lyndon, I think Krugman did a pretty solid job of explaining too. Another way to look at distributions is to think of them as functions that convert local and specific margins into overall probabilities. But a key point not to be lost is that distributions also apply to forecasting markets, liability, crime, war, sports, any large scale wide participation human activity. I guarantee that someone who is chuckling "silly Republicans" right now will make the same mathematical error themselves within a day. (I hope to reduce the number but I don't think it can be eliminated).
Thanks John for explaining in part what happen to republican pollsters with regard to understanding or the lack thereof with respect to distributions. The method of analysis seems easy enough however many make this kind of mistake whenever this tool is in use.
I am not sure I understand it completely either, but I take pride in practicing your 5th tip - Knowing the ground. This alone can make up for numerous statistical shortcomings IMO.
John Barnes writes Wishful thinking was only a minor factor in the massive, obvious, embarrassing error by conservative pundits who predicted that the 2012 presidential election would be a dead heat or even a Mitt Romney landslide. A profound misunderstanding of statistical distributions caused the humiliation of so many conservative bloggers, journalists, and campaign managers.
In a sense, the profound failure of GOP election prediction reflects a case of getting caught by their own petard. Carl Rove's vehement disbelief, witnessed by millions on live TV when Fox News analysts called Ohio for Obama, is iconic, and it seems to reflect a situation of believing the fantasies in the whacko reality you have constructed and led others into.
In another sense, the GOP prediction failure represents a failure of a kind of a 21st-century Inquisition. The GOP targeted venomous anger against both polls and analysts who dared to use math objectively and read the results that suggested a rather solid Obama victory. This level of disbelief and rejection of science (math) reminds me of the pressure brought to bear on Galileo, forcing him to deny what his own scientific research and observations were telling him. Fortunately, for this election, the rightwing Inquisition simply fizzled.
Nate Silver of the NYT's 538 blog, a platform mainly for the presentation of the results of his own political analytics, has been widely hailed for the accuracy of his math-based predictions. For example, see:
Here are some interesting quotes: Silver came through with flying colors, as Obama performed nearly exactly the way he said he would. The public recognition was immediate.
"You know who won the election tonight? Nate Silver," Rachel Maddow said on MSNBC. Even Fox News tipped its cap to Silver.
Others said that the results could force a bit of a sea change in political journalism.
"What does this victory mean?" Mashable's Chris Taylor wrote. "That mathematical models can no longer be derided by "gut-feeling" pundits. That Silver's contention -- TV pundits are generally no more accurate than a coin toss -- must now be given wider credence."
Silver, of course, became a particularly hated target of the rightwing anti-science blitz that attempted to portray some kind of mysterious Romney "surge" till the bitter end.
Economist and NYT columnist Paul Krugman discussed much of this (somewhat along the lines of John Barnes's explanation) in a Nov. 4th blog entry:
Some of Krugman's interesting points: First of all, from what I can see a lot of people have trouble with the distinction between probabilities and vote margins. ...
Second, people clearly have a problem with randomness — with the fact that any poll, no matter how carefully conducted, has a margin of error. (And the true margins of error are surely larger than the statistical measure always reported, since sampling error isn't the only way a poll can go wrong). ...
What this means is that if you look at all the polls, you're very likely to find one or two that tell you what you want to hear... even good pollsters will produce an occasional off result, and you really, really don't want to start picking and choosing those off results to make yourself feel good.
...Oh, and a third point: those margins of error are for any one poll. An average of many polls will have a much smaller standard error.
Seth, accuracy wasn't really an issue here; it's just that when you have successive close-numbers events and one side needs fewer wins than the other, the side that needs fewer wins has a massive advantage. As the IRA communicated to the Queen after a failed assassination attempt, "You have to be lucky every time. We only have to be lucky once."
I saw the articles explaining an 80% chance of winning. It always amazes me how just a couple of percentage points here and there can cause major events to go in one direction. One state polls may have a large margin of error, but the margin of error is much reduced, however, when you aggregate different polls together, since that creates a much larger sample size.
Diego Klabjan, chair of the INFORMS University Analytics Program Committee and program director for Northwestern University's Master of Science in Analytics program, gives his advice for figuring out where to get an advanced analytics degree.
What Works: Open Source Analytics Software International Institute for Analytics WebinarOn Wednesday, Sept. 24, join IIA CEO and Co-Founder Jack Phillips, along with featured guest Gary Spakes, as we explore the five modernization stages that analytics hardware/software have experienced. We will discuss the considerations when calculating total cost of ownership of the analytics ecosystem.
2014 VA Interactive Roadshow -- Cary, NCThe 2014 VA Interactive Roadshow will feature SAS® Data Management and SAS® Visual Analytics experts covering topics like prepping data for VA and VA integration with SAS® Office Analytics. This year's events will keep presentations at a minimum and focus on giving attendees hands-on exposure to the latest version of VA.
Essential Practice Skills for Analytics Professionals Drawing on best practices from the field, this INFORMS course helps analytics professionals add value from beginning to end: listening to clients, framing the central problem, scoping a project, defining metrics for success, creating a work plan, assembling data and expert sources, selecting modeling approaches, validating and verifying analytical results, communicating and presenting results to clients, driving organizational change, and assessing impact.
Analytics 2014 The Analytics 2014 Conference is a two-day, educational event for anyone who is serious about analytics. This annual event brings together hundreds of professionals, industry experts and leading researchers in the field of analytics. All Analytics members save $500 on conference fees by using promo code ACAA.
Premier Business Leadership Series 2014 The Premier Business Leadership Series is an exclusive event for senior executives and decision makers that focuses on solving the current issues that affect governments and businesses globally. The Series is a unique learning and networking experience focused on the most innovative leadership strategies and analytic solutions for competing in todayâ€™s global economy.
2014 VA Interactive Roadshow -- BostonThe 2014 VA Interactive Roadshow will feature SAS® Data Management and SAS® Visual Analytics experts covering topics like prepping data for VA and VA integration with SAS® Office Analytics. This year's events will keep presentations at a minimum and focus on giving attendees hands-on exposure to the latest version of VA.
Data Exploration & Visualization Get hands-on training that focuses on the critical steps in the process of analyzing data: accessing and extracting data, cleaning and preparing data, exploring and visualizing data. This INFORMS course will use several of the most popular software tools intensively, and provide an overview of the range of software options.
Foundations of Modern Predictive Analytics In this INFORMS course, learn about modern predictive analytics, the science of discovering and exploiting complex data relationships. This course will give participants hands-on practice in handling real data types, real business problems and practical methods for delivering business-useful results.
2014 VA Interactive Roadshow -- AtlantaThe 2014 VA Interactive Roadshow will feature SAS® Data Management and SAS® Visual Analytics experts covering topics like prepping data for VA and VA integration with SAS® Office Analytics. This year's events will keep presentations at a minimum and focus on giving attendees hands-on exposure to the latest version of VA.
LEADERS FROM THE BUSINESS AND IT COMMUNITIES DUEL OVER CRITICAL TECHNOLOGY ISSUES
The Current Discussion
Visual Analytics: Who Carries the Onus? The Issue: Data visualization is an up-and-coming technology for businesses that want to deliver analytical results in a visual way, enabling analysts the ability to spot patterns more easily and business users to absorb the insight at a glance and better understand what questions to ask of the data. But does it make more sense to train everybody to handle the visualization mandate or bring on visualization expertise? Our experts are divided on the question. The Speakers: Hyoun Park, Principal Analyst, Nucleus Research; Jonathan Schwabish, US Economist & Data Visualizer
The hospitality industry gathers massive amounts of customer data, and mining that data effectively can yield tremendous results in terms of improved CRM, better-targeted marketing spend, and more efficient back-end processes. Roger Ares, vice president of analytics at Hyatt Corp., discusses the ways he and his staff use big data.
Charged with keeping track of travel assets, including employees, iJET International relies on data management best-practices and advanced analytics to keep its clients in the know on current and potential world events affecting travel, Rich Murnane, Director of Enterprise Data Operations & Data Architect, told All Analytics in an interview from the 2014 SAS Global Forum Executive Conference.
Jason Dorsey, chief strategy officer for the Center for Generational Kinetics and keynote speaker at last month's SAS Global Forum 2014, describes how Gen Y professionals are enhancing the makeup of multigenerational analytics organizations.
From analytics talent development to the power of visual analytics, All Analytics found a variety of common themes circulating throughout the exhibition floor and session discussions at the 2014 SAS Global Forum and SAS Global Forum Executive Conference events held last month in Washington, DC.
Talking with All Analytics live from the 2014 SAS Global Forum Executive Conference, Eric Helmer, senior manager of campaign design and execution for T-Mobile, discussed the importance of customer data -- starting internally -- in devising the mobile operator's marketing plans.
The big-data analytics market can be a confusing place. Among the vendors vying for your dollars are traditional database management providers, Hadoop startup services, and IT giants. In this video, All Analytics editors Beth Schultz and Michael Steinhart sit down in a Google+ Hangout on Air with Doug Henschen, executive editor of InformationWeek. Henschen discusses use cases for big-data analytics, purchase considerations, and his recent roundup of the top 16 big-data analytics platforms.
At the National Retail Federation BIG Show last month, All Analytics executive editor Michael Steinhart noted a host of solutions for tracking and analyzing customer activity in retail stores. From Bluetooth beacons to RFID tags to NFC connections to video analytics, retailers must find the right combination of tools to help optimize the shopper experience, streamline operations, and boost revenues.
The days when historical shipment trends and gut feelings were enough to forecast retail demand accurately are long over. SAS chief industry consultant Charles Chase outlines the benefits of pulling real-time sales information from point-of-sale and product scanner systems, then flowing that data into dynamic forecasting tools from SAS.
With today's advanced visual analytics tools, you can stream data into memory for real-time processing, provide users the ability to explore and manipulate the data, and bring your data to life for the business.
Dynamic data visualizations let analysts and business users interact with the data, changing variables or drilling down into data points, and see results in a flash. Advance your use of data visualization with tools that support features like auto-charting, explanatory pop-ups, and mobile sharing.
No doubt your enterprise is amassing loads of data for fact-based decision-making. Hand in hand with all that data comes big computational requirements. Can traditional IT infrastructure handle the increasing number and complexity of your analytical work? Probably not, which is why you need a backend rethink. Big data calls for a high-performance analytics infrastructure, as Fern Halper, a partner at the IT consulting and research firm, Hurwitz & Associates, discusses here.
Redbox's bright-red DVD kiosks are all but ubiquitous these days, located in more than 28,000 spots across the country. Jayson Tipp, Redbox VP of Analytics and CRM, provides an insider's look at how the company has accomplished its phenomenal nine-year growth.
InterContinental Hotels Group (IHG), a seven-brand global hotelier, has woven analytics into the fabric of its operations. David Schmitt, director of performance strategy and planning, shares IHG's analytics story and his lessons learned.