I’m not just a supporter, but an ardent fan of big-data. I believe big-data is crucial to solving some of the most important problems of our day, from improving healthcare and making it more affordable, to making the world safer and advancing individual liberties, to growing the economy and protecting the environment.
It's also true that I think all of these things are going to prove far more difficult than we can imagine. Improving data quality is not the only difficulty, but it's up there.
I find two examples especially instructive: the search for the Higgs boson, or Higgs particle, and collateralized debt obligations. The discovery of the Higgs boson is a crowning achievement. Collateralized debt obligations, not so much.
One can't help but be impressed by the rigor scientists follow as they create and analyze data. They design the experiments carefully, specify definitions of key terms, carefully define and manage their data collection processes, calibrate their instruments, and scrutinize the data for inadvertent errors. They analyze their data from numerous perspectives. Many seek connections with other disciplines. “Control” is the watchword throughout.
In the case of the Higgs boson, physicists waited until they could rule out the chance of a spurious result with confidence (less than one chance in three million). The thoroughness has paid off: Just yesterday, as widely reported, the European Organization for Nuclear Research, or CERN as its commonly known, announced that its official findings have received peer-review approval. But even now they aren’t claiming they’ve found it, only that they’ve found a “Higgs-like” particle.
The Higgs boson example stands in marked contrast to the rush to package and sell collateralized debt obligations. If financial institutions pride themselves on one thing, it's their ability to price risk. And collateralized debt obligations held enormous promise to slice and dice mortgages, packaging risk to suit individual investors. But, as everyone knows, the underlying data proved bad. And all of us are still suffering from the fallout.
In this new age of big-data, the old truism surely persists: Massive quantities of garbage in, massive quantities of garbage out.