To me, "understanding" is the key word in this statement.
I believe that one of the reasons Silver's forecasts are good is that he has a multidisciplinary skillset. He has an understanding of statistics, the data, and the domain in which the prediction problem lies. As an economist, he has been trained to apply mathematical models to real-world problems, combining his understanding of statistics and econometrics to solve a very data-rich problem. Sounds like fun, actually.
Really, Silver's methodology isn't all that groundbreaking. Everything he does involves tried-and-true techniques that only seem revolutionary because he is successfully applying them in a way new to the domain. I think back to an old colleague, let's call him Carl, and his methodology for doing an internal business forecast, which was to put numbers into a spreadsheet and produce what I would call -- in polite company -- a naïve forecast. Poor Carl didn't know any better, and his results showed it. The forecast was awful. Carl wasn't a statistician or econometrician -- he was a finance guy. He knew finance.
When I helped Carl create a forecast that took into account seasonality, autocorrelation structure, structural changes, and so forth, you would have thought I had pulled out my wizard's staff and conjured a miracle. Carl was awestruck -- a lot like the people who have been following Silver's success. But I hadn't really done anything magical, and neither has Silver. We just applied existing techniques to a domain accustomed to doing things the old way.
To be successful, you have to understand the domain very well. You have to understand the data extremely well. You have to understand information technology at an expert level. You have to understand the tools in your toolbox. And you have to understand how to put all of this together in a creative way to solve the problem at hand. It takes experimentation, curiosity, and creativity. That's what I believe the term "data scientist" implies more than anything else.
Big-data is a multidisciplinary game. Success requires deep expertise in multiple disciplines as well as the creativity to solve problems in ways that haven't previously been done. That's one of the reasons there is such a shortage of people who can handle big-data effectively. We've been handling data in a certain way for so long that change is difficult, and our existing talent pipeline is still tooled for the old approach. Higher education is still churning out skillsets that are largely uni-disciplinary, and businesses are no different. I sometimes think that when I say the word "analytics" to most audiences, their brains translate the word into "business intelligence," with visions of OLAP cubes and KPI dashboards dancing in their heads. Repeat after me: text analytics, neural networks, nonlinear optimization, simulation, bootstrapping. Please, please don't show me another pretty BI presentation tool and ask me if it will meet my analytic needs. Please.
It's the newer businesses of the Internet Age that seem to be most effective in dealing with big-data, in part because they had to invent the new approach, and they had little to unlearn.
And speaking of unlearning, Silver's book is the first one I've encountered where most of the references use e-reader positions instead of page numbers. I guess I have to spring for the e-reader edition and adapt my research approach if I want to dig deeper. So, good luck to all of you old-schoolers out there. You've officially been left behind. Maybe you can find solace in your OLAP cubes.
And Silver, I'll be seeing you in the blogosphere in about four years, buddy. Keep tuning that model.
" I'd argue that you have to know that you don't know everything going in. There's an exploratory nature to Big Data that many people don't understand. It's not a checklist. It's a journey."
Hi Doug, It's good to hear from you. Great point. That "one version of the truth" idea is one of the most pernicious beliefs going. What's the census in your hospital today? Well, it depends on things like when you measure it, which systems you're sourcing from, and, perhaps most importantly, why you are asking the question. I understand why people don't want different reports showing different numbers for that census, but there could be good reasons why they are different. Perhaps the reports were intended to answer different questions, or they were sourced from different points in the life cycle.
One of my favorite quotes from the world of statistics is from George Box:
"All models are wrong, but some are useful."
My slightly tongue-in-cheek corollary to that is:
"All reports are wrong, but some are useful."
Reports involve measures of real-world events, and all measures are subject to error.
The way I think of it is that (most of) B.I. is about delivering up WHAT THE ORGANIZATION ALREADY KNOWS IS IMPORTANT. The various B.I. architectures, tools and typically heavy IT processes that are familiar to us all were designed to do just that, quickly, efficiently, consistently, and repeatedly. If B.I. has a dominant (if misbegotten) motto, it's got to be "One version of the truth." Which is very telling, because a singular truth is only conceivably possible when looking at what's already happened. *
In contrast, (not-B.I.) analytics is primarily about DISCOVERING WHAT IS NOT KNOWN, or projecting what could plausibly happen in the future. There can be no singular truth about the future, too many things can happen. So much depends on what the butterflies in Brazil decide to do. Or a black swan.
So B.I. and traditional reporting allow an organization to manage reactively to what has already happened. And the need to do that will probably never go away. But exploratory/predictive analytics help the organization discover new things, see into the future, and manage more proactively in anticipation of things that are likely to happen.
Mark's reference to a "data ecosystem" is a useful, evocative phrase. I'm going to try to remember that.
(* Just for fun, I challenge proponents of "one version of the truth" to explain why we have 1+19,999 different books describing and explaining the life of Abraham Lincoln.)
@David.Pope - Great thoughts. I think BI tools are a critical component of the big data ecosystem, and making the analytic result organic to the business process and easily consumable are a key part of the analytics value chain. What I'm trying to do is broaden the horizons of those who don't understand there is something beyond star schemas, SQL, and OLAP cubes.
@philsimon - Thanks for your comment - it's good to hear from you. I couldn't agree more regarding the exploratory nature of analytics. The two adjectives I use most often are "exploratory" and "iterative." You don't have to know everything going in, but you do have to be able to learn quickly and "connect-the-dots" across previous experience and other disciplines.
One of the main difficulties we all seem to continue to encounter is related to this mistaking BI and Analytics as being the same thing (thereby diluting the true business value derived when analytics really are used). I believe we can all agree that to be impactful analytic insight must be deployed into operation systems. From a technical perspective data scientists may rightly assume I am talking about how to run scoring (or the end result of developing a predictive model) in different systems/platforms etc... , however it has recently become more and more apparent to me that deploying analytics from more of a business perspective is in essence having the analytical based insight show up and be easily understood by others across an organization. This is where the problem comes in, because for the insight to show up and be understood it has to be in a "report" or in other words BI. This is where having to "sell" analytic value becomes very important, because most end consumers associate the value of analytics in the BI based report they receive. The best way I believe to show someone the difference would be to show someone a report with true analytics baked in and then the same exact report with the analytical based insight REMOVED, then ask them which gives more value. In the case Mark mentioned regarding forecasting the problem is a bit more difficult because the reports may look exactly the same from a formatting perspective, they may both have numbers and then graphs based on those numbers, it's just the numbers and graphs based on analytics provide a more accurate end result.
To be successful, you have to understand the domain very well. You have to understand the data extremely well. You have to understand information technology at an expert level. You have to understand the tools in your toolbox. And you have to understand how to put all of this together in a creative way to solve the problem at hand. It takes experimentation, curiosity, and creativity. That's what I believe the term "data scientist" implies more than anything else.
I'd argue that you have to know that you don't know everything going in. There's an exploratory nature to Big Data that many people don't understand. It's not a checklist. It's a journey.
@BethSchulz - Thanks for your comment. BI still has a place in the ecosystem, even for me. The point I'm trying to make is that we have more than just BI in our toolbox these days, but many people I encounter limit their thinking to BI solutions. When I say analytics, I'm referring to BI but also to much, much more.
Mark, interesting points you raise here. Let me start with this one: "I sometimes think that when I say the word "analytics" to most audiences, their brains translate the word into "business intelligence," with visions of OLAP cubes and KPI dashboards dancing in their heads. Repeat after me: text analytics, neural networks, nonlinear optimization, simulation, bootstrapping. Please, please don't show me another pretty BI presentation tool and ask me if it will meet my analytic needs. Please." Is this to say at your level BI isn't for you or that at any level BI isn't for a company any longer -- that it must advance its thinking?
Nate Silver has already inspired a "drunk Nate Silver" meme on Twitter, like this tweet from @jfruh: "Drunk Nate Silver waits 20 minutes for the G train, nods silently when it arrives, walks out of the station."
NATE SILVER: If only people knew the real drunk Nate Silver. I'm not so dark, necessarily. I just get into stupid arguments about sports with my friends. It's one thing when you have yourself, but it's another thing when you start to symbolize a movement and you don't really have control over it in a certain sense.
With an Android app, cellphone data, and analytics, the Navy hopes to better understand, monitor, and react during a crisis like severe flooding, earthquakes, and disease outbreaks.
Analysis of -omics data -- genomics, proteomics, transcriptomics, etc. -- can take tremendous computing power and data storage. Big-data can provide the fuel.
LEADERS FROM THE BUSINESS AND IT COMMUNITIES DUEL OVER CRITICAL TECHNOLOGY ISSUES
The Current Discussion
Visual Analytics: Who Carries the Onus? The Issue: Data visualization is an up-and-coming technology for businesses that want to deliver analytical results in a visual way, enabling analysts the ability to spot patterns more easily and business users to absorb the insight at a glance and better understand what questions to ask of the data. But does it make more sense to train everybody to handle the visualization mandate or bring on visualization expertise? Our experts are divided on the question. The Speakers: Hyoun Park, Principal Analyst, Nucleus Research; Jonathan Schwabish, US Economist & Data Visualizer
To save this item to your list of favorite AllAnalytics content so you can find it later in your Profile page, click the "Save It" button next to the item.
If you found this interesting or useful, please use the links to the services below to share it with other readers. You will need a free account with each service to share an item via that service.
Dynamic data visualizations let analysts and business users interact with the data, changing variables or drilling down into data points, and see results in a flash. Advance your use of data visualization with tools that support features like auto-charting, explanatory pop-ups, and mobile sharing.
No doubt your enterprise is amassing loads of data for fact-based decision-making. Hand in hand with all that data comes big computational requirements. Can traditional IT infrastructure handle the increasing number and complexity of your analytical work? Probably not, which is why you need a backend rethink. Big data calls for a high-performance analytics infrastructure, as Fern Halper, a partner at the IT consulting and research firm, Hurwitz & Associates, discusses here.
Redbox's bright-red DVD kiosks are all but ubiquitous these days, located in more than 28,000 spots across the country. Jayson Tipp, Redbox VP of Analytics and CRM, provides an insider's look at how the company has accomplished its phenomenal nine-year growth.
InterContinental Hotels Group (IHG), a seven-brand global hotelier, has woven analytics into the fabric of its operations. David Schmitt, director of performance strategy and planning, shares IHG's analytics story and his lessons learned.
Elizabeth Barth-Thacker, a BI and informatics technology manager at Humana, tells us how her team is creating data transparency and building engagement with the business – with the help of an internal collaboration portal called Humanalytics.
Speaking at SAS Global Forum Executive Conference, Rajeev Kaul, SVP of pricing at OfficeMax, uses a Chinese proverb to explain one of the reasons he's deploying SAS Visual Analytics.
In an All Analytics interview, Mike Cavaretta, technical leader, predictive analytics at Ford Research & Advanced Engineering, shares how big-data is fueling vehicle decisions.
Analytics professionals and SAS executives share how organizations can get on with their work so much faster when working in a high-performance and visual analytics environment.
Analytics professionals who attended SAS's recent Executive Briefing in New York share how they think visual analytics might help their organizations get better value from data.
At Boeing, effective decision making comes down to this simple formula: QxA=E, as executive Jerry Allyne explained at the recent INFORMS analytics conference.
Whether working in major league sports, financial services, or healthcare, analytics, and data, professionals are checking out how visual analytics and high-performance technologies can help them optimize their environments, shrink their cycle times, and improve decision making, as attendees at the recent SAS Executive Briefing in New York share with us.
SAS CEO Jim Goodnight speaks with us at a recent SAS Executive Briefing about getting a feel for what's in your big-data and other new realities powered by advanced analytics.