So please, don't try to impress me with your massive data cluster. Don't brag about hackathons and contests. You've got funding -- so what? Show me an acknowledged business issue, one that is causing businesses to hemorrhage money, and how you are going to solve it.
While we're examining the worth of big-data, let's also give some thought to its origins. The most widely publicized source of big-data is social media. The commercial appeal of social media data lies primarily in potential advertising revenue. The more we know about the consumer, the greater the opportunity for targeted advertising that maximizes revenue. At least, that's the theory. But social media is not the whole big-data landscape.
Kathleen Morrissey, a partner at Strategy 2 Market, a Chicago-based new product development consulting service, advises tech companies developing manufactured goods and medical devices. When asked, "What's big-data?" she doesn't talk about specific quantities. Instead, Morrissey extends the popular "three V's" -- volume, variety, and velocity -- with a fourth one: value. And she refers to three classes of data: traditional data, machine-generated/sensor data, and social media/new formats.
Let's examine those data classes.
Traditional data includes the information that organizations collect in the routine course of business, and did even before the computer age. Examples include all records of business transactions, such as in customer relationship management, points of sale, and credit reporting. Also in this class is government activity, such as IRS filings and census reports. Stock transactions, mapping, and much scientific research data fall under this umbrella, too.
Machine-generated and sensor data is information recorded by a device automatically without human intervention. Flight data recorders are a well-known example, but automobiles, medical devices, and a wide variety of industrial machinery also produce logs automatically. Web activity logs, global positioning system data, and automated toll payment data are other important examples of this data category.
Social media and new formats include shared content, such as text, video, and audio from social media sites, blogs, and other Websites. Emerging sources falling into this category include facial recognition, satellite imaging, and social linkages and influence.
With such a wide field of potential data sources, how can you identify the most valuable opportunities for exploiting them?
Identify a business problem involving a large number of actions, each with an associated potential for savings or increased revenue. Is there data available (or obtainable) that would help you address the problem? Do you need a lot of data, or would a well-crafted sample do the job? If it's your problem to solve, sampling is good.
If you're selling the data, though, you'll want to find a problem that absolutely demands a lot of data. That's why individually targeted ads are such a hot area for social media data. But there are other cases that call for lots of data, such as fraud detection, credit scoring, and network security, each of which may be valuable opportunities for examining many transactions in detail.
Next time the talk turns to big-data, impress us all with a story of a big business problem, and your plan to solve it. And, if you've already done so, share on the message board below!