Spare Me Tales of Your Massive Data Cluster

The worth of a dataset is not in its size, but in the size of the problem you can solve with it.

So please, don't try to impress me with your massive data cluster. Don't brag about hackathons and contests. You've got funding -- so what? Show me an acknowledged business issue, one that is causing businesses to hemorrhage money, and how you are going to solve it.

While we're examining the worth of big-data, let's also give some thought to its origins. The most widely publicized source of big-data is social media. The commercial appeal of social media data lies primarily in potential advertising revenue. The more we know about the consumer, the greater the opportunity for targeted advertising that maximizes revenue. At least, that's the theory. But social media is not the whole big-data landscape.

Kathleen Morrissey, a partner at Strategy 2 Market, a Chicago-based new product development consulting service, advises tech companies developing manufactured goods and medical devices. When asked, "What's big-data?" she doesn't talk about specific quantities. Instead, Morrissey extends the popular "three V's" -- volume, variety, and velocity -- with a fourth one: value. And she refers to three classes of data: traditional data, machine-generated/sensor data, and social media/new formats.

Let's examine those data classes.

Traditional data includes the information that organizations collect in the routine course of business, and did even before the computer age. Examples include all records of business transactions, such as in customer relationship management, points of sale, and credit reporting. Also in this class is government activity, such as IRS filings and census reports. Stock transactions, mapping, and much scientific research data fall under this umbrella, too.

Machine-generated and sensor data is information recorded by a device automatically without human intervention. Flight data recorders are a well-known example, but automobiles, medical devices, and a wide variety of industrial machinery also produce logs automatically. Web activity logs, global positioning system data, and automated toll payment data are other important examples of this data category.

Social media and new formats include shared content, such as text, video, and audio from social media sites, blogs, and other Websites. Emerging sources falling into this category include facial recognition, satellite imaging, and social linkages and influence.

With such a wide field of potential data sources, how can you identify the most valuable opportunities for exploiting them?

Identify a business problem involving a large number of actions, each with an associated potential for savings or increased revenue. Is there data available (or obtainable) that would help you address the problem? Do you need a lot of data, or would a well-crafted sample do the job? If it's your problem to solve, sampling is good.

If you're selling the data, though, you'll want to find a problem that absolutely demands a lot of data. That's why individually targeted ads are such a hot area for social media data. But there are other cases that call for lots of data, such as fraud detection, credit scoring, and network security, each of which may be valuable opportunities for examining many transactions in detail.

Next time the talk turns to big-data, impress us all with a story of a big business problem, and your plan to solve it. And, if you've already done so, share on the message board below!

Meta S. Brown, Business Analytics Consultant

Meta S. Brown is a consultant, speaker, and writer who promotes the use of business analytics. A hands-on analyst who has tackled projects with up to $900 million at stake, she is a recognized expert in cutting-edge business analytics. She has conducted more than 4,000 hours of presentations about business analytics, and written guides on neural networks, quality improvement, statistical process control, and many other statistical methods. Meta's seminars have attracted thousands of attendees from across the US and Canada, from novices to professors.

Tell Me a Story: Why Data Analysts Must Be Storytellers, Too

Data alone won't make an analyst's work memorable or actionable in the eyes of a business executive. A story puts it into perspective.

It's the Data, Stupid

When it comes to acquiring the data that will feed your analytics initiative, "free" isn't always the best approach.

Re: Balk, balk
  • 11/12/2012 8:05:47 PM

Thanks Meta,  your article makes a lot of good points and yes, value may be the most important V in the equation.  I also think your data definitions will help people organize information better. 

When people are buying data, what they should be looking for is information that will help them solve a problem vs. a big basket of mess. 

Re: Balk, balk
  • 11/12/2012 5:20:35 PM

You know, Beth, I don't think CEOs of big companies ask that question very often. At least not the folks on the buying end of the equation. But I do hear this from CEOs of little companies that sell data-centric services and products. Focusing on the size of the data store is technology-focused behavior. The really big wigs are more into $$$ talk.

Balk, balk
  • 11/12/2012 11:17:54 AM

Meta, it sounds like the old chicken-and-the-egg problem, exacerbated no doubt at lots of companies by mainstream coverage of the big-data phenomenon. How many CIOs or analytics execs at companies do you think have had the "What are WE doing with big-data?" question from a CEO who's just read how company XYZ changed its life via big-data analytics? As you suggest, the savviest will respond by talking about critical business problems and how the company can use data -- big or not -- to solve them.

Excellent point
  • 11/12/2012 7:20:43 AM

You make an excellent point here Meta: Show me an acknowledged business issue, one that is causing businesses to hemorrhage money, and how you are going to solve it.