As I pointed out in my first post, my new favorite quote, compliments of Nate Silver's The Signal and the Noise: Why So Many Predictions Fail -- But Some Don't, is: "We face danger whenever information growth outpaces our understanding of how to process it."
To me, "understanding" is the key word in this statement.
I believe that one of the reasons Silver's forecasts are good is that he has a multidisciplinary skillset. He has an understanding of statistics, the data, and the domain in which the prediction problem lies. As an economist, he has been trained to apply mathematical models to real-world problems, combining his understanding of statistics and econometrics to solve a very data-rich problem. Sounds like fun, actually.
Really, Silver's methodology isn't all that groundbreaking. Everything he does involves tried-and-true techniques that only seem revolutionary because he is successfully applying them in a way new to the domain. I think back to an old colleague, let's call him Carl, and his methodology for doing an internal business forecast, which was to put numbers into a spreadsheet and produce what I would call -- in polite company -- a na´ve forecast. Poor Carl didn't know any better, and his results showed it. The forecast was awful. Carl wasn't a statistician or econometrician -- he was a finance guy. He knew finance.
When I helped Carl create a forecast that took into account seasonality, autocorrelation structure, structural changes, and so forth, you would have thought I had pulled out my wizard's staff and conjured a miracle. Carl was awestruck -- a lot like the people who have been following Silver's success. But I hadn't really done anything magical, and neither has Silver. We just applied existing techniques to a domain accustomed to doing things the old way.
To be successful, you have to understand the domain very well. You have to understand the data extremely well. You have to understand information technology at an expert level. You have to understand the tools in your toolbox. And you have to understand how to put all of this together in a creative way to solve the problem at hand. It takes experimentation, curiosity, and creativity. That's what I believe the term "data scientist" implies more than anything else.
Big-data is a multidisciplinary game. Success requires deep expertise in multiple disciplines as well as the creativity to solve problems in ways that haven't previously been done. That's one of the reasons there is such a shortage of people who can handle big-data effectively. We've been handling data in a certain way for so long that change is difficult, and our existing talent pipeline is still tooled for the old approach. Higher education is still churning out skillsets that are largely uni-disciplinary, and businesses are no different. I sometimes think that when I say the word "analytics" to most audiences, their brains translate the word into "business intelligence," with visions of OLAP cubes and KPI dashboards dancing in their heads. Repeat after me: text analytics, neural networks, nonlinear optimization, simulation, bootstrapping. Please, please don't show me another pretty BI presentation tool and ask me if it will meet my analytic needs. Please.
It's the newer businesses of the Internet Age that seem to be most effective in dealing with big-data, in part because they had to invent the new approach, and they had little to unlearn.
And speaking of unlearning, Silver's book is the first one I've encountered where most of the references use e-reader positions instead of page numbers. I guess I have to spring for the e-reader edition and adapt my research approach if I want to dig deeper. So, good luck to all of you old-schoolers out there. You've officially been left behind. Maybe you can find solace in your OLAP cubes.
And Silver, I'll be seeing you in the blogosphere in about four years, buddy. Keep tuning that model.