The Trouble with Data About Data

Two people looking at the same analytical result can come to different conclusions. The same goes for the collection of data and its presentation. A couple of experiences underscore how the data about data -- even from authoritative sources -- may not be as accurate as the people working on the project or the audience believe. You guessed it: Bias can turn a well-meaning, "objective" exercise into a subjective one. In my experience, the most nefarious thing about bias is the lack of awareness or acknowledgement of it.

The Trouble with Research

I can't speak for all types of research, but I'm very familiar with what happens in the high-tech industry. Some of it involves considerable primary and secondary research, and some of it involves one or the other.

Let's say we're doing research about analytics. The scope of our research will include a massive survey of a target audience (because higher numbers seem to indicate statistical significance). The target respondents will be a subset of subscribers to a mailing list or individuals chosen from multiple databases based on pre-defined criteria. Our errors here most likely will include sampling bias (a non-random sample) and selection bias (aka cherry-picking).

The survey respondents will receive a set of questions that someone has to define and structure. That someone may have a personal agenda (confirmation bias), may be privy to an employer's agenda (funding bias), and/or may choose a subset of the original questions (potentially selection bias).

The survey will be supplemented with interviews of analytics professionals who represent the audience we survey, demographically speaking. However, they will have certain unique attributes -- a high profile or they work for a high-profile company (selection bias). We likely won't be able to use all of what a person says so we'll omit some stuff -- selection bias and confirmation bias combined.

We'll also do some secondary research that bolsters our position -- selection bias and confirmation bias, again.

Then, we'll combine the results of the survey, the interviews, and the secondary research. Not all of it will be usable because it's too voluminous, irrelevant, or contradicts our position. Rather than stating any of that as part of the research, we'll just omit those pieces -- selection bias and confirmation bias again. We can also structure the data visualizations in the report so they underscore our points (and misrepresent the data).

We Need to Improve, Desperately

Bias is not something that happens to other people. It happens to everyone because it is natural, whether consciously or unconsciously. Rather than dismiss it, it's prudent to acknowledge the tendency and attempt to identify what types of bias may be involved, why, and rectify them, if possible.

I recently worked on a project for which I did some interviews. Before I began, someone in power said, "This point is [this] and I doubt anyone will say different." Really? I couldn’t believe my ears. Personally, I find assumptions to be a bad thing because unlike hypotheses, there's no room for disproof or differing opinions.

Meanwhile, I received a research report. One takeaway was that vendors are failing to deliver "what end customers want most." The accompanying infographic shows, on average, that 15.5% of end customers want what 59% of vendors don't provide. The information raised more questions than it answered on several levels, at least for me, and I know I won't get access to the raw data.

My overarching point is that bias is rampant and burying our heads in the sand only makes matters worse. Ethically speaking, I think as an industry, we need to do more.

What's your experience? We'd love to hear the types of bias you encountered, the forms they came in, what was done about the bias (if anything), and the outcome.

Lisa Morgan, Freelance Writer

Lisa Morgan is a freelance writer who covers big data and BI for InformationWeek. She has contributed articles, reports, and other types of content to various publications and sites ranging from SD Times to the Economist Intelligent Unit. Frequent areas of coverage include big data, mobility, enterprise software, the cloud, software development, and emerging cultural issues affecting the C-suite.

How Today's Analytics Change Recruiting

HR analytics aren't mainstream yet, but they're gaining traction because human capital management is a competitive issue. From recruiting to employee performance and overall HR operations, analytics will have an increasing impact on companies and individuals.

How the IoT Will Impact Enterprise Analytics

Organizations are collecting more data from devices, whether in industrial settings, retail operations, or other installations. Here's a closer look at what they are doing with all that data.

Re: The marshmallow test is generally misinterpreted
  • 9/30/2016 9:15:18 PM

I think it's more about how one views human nature. There's definitely a balance to be struck in how man behaves, i.e. man can have habits that make it easier to act consistently, yet choose to go against his habits at any given point. A less-than-wholistic view of humanity will make the study look like it's supporting a certain leaning.

Re: The marshmallow test is generally misinterpreted
  • 9/30/2016 12:18:06 PM

Ooh...sounds like Simpson's Paradox might be at work!  (Analysis of the same data leads to opposite results at different levels of aggregation.)

The marshmallow test is generally misinterpreted
  • 9/30/2016 11:56:09 AM

The media generally portrays the famous Marshmallow Test as proof that a certain behavior developed as a child will yield future success. However, I learned recently from NPR's Invisibilia podcast that the proponent of this research actually proved the exact opposite: that at any point in a person's life, one can make a decision that's different from previous behavior. The same kids who would previously have nabbed the marshmallow CAN choose to wait if given a greater good to strive for. The podcast is here:

Re: The biggest bias
  • 9/27/2016 12:58:59 PM

@jamescon, great point on the importance of data selection. Data quality goes beyond clean data, applicability is paramount.

Re: The biggest bias
  • 9/27/2016 12:54:38 PM

@Kq4ym, you bring up the presidential debate. Well noted that the vast majority go into the debate firmly entrenched in their leaning and dislodging them is unlikely, so the goal is to maintain engagement, hopefully spur enthusiasm and involvement with donations of money and volunteering. Just feeding the established position.

Re: The biggest bias
  • 9/27/2016 12:08:24 PM

Our human natures, aka biases, are built in and often take an extraordinary amount of insight to overcome. The Presidential debates maybe illustrate how individuals and those who hear boths sides most often will not switch sides or opionions because of those built in biases. How political advisors may take advantage of that is an interesting exercise in psychology if not in devising political data studies.

Re: The biggest bias
  • 9/26/2016 2:05:14 PM

Yes, @SethBreedlove!  That is one reason I am constantly trying to make a distinction between assumptions and hypotheses.  The thought process is different!

Assumption = I am right

Hypothesis = I may or may not be right.

And yes, consistent comentary about the first data being that which we trust.  I believe we  need to shift the thinking process generally.

Re: The biggest bias
  • 9/26/2016 1:35:07 PM

I think one of the best ways to deal with bias is to try to disprove your own ideas.  It's actually surprisingly easy to do and a little humbling to find all the holes in our original theories.  One of our human flaws is that whatever information we get first is what we hold on to the strongest.  Everything after that has the burden of disproving the first.  And it's uncomfortable to tell our brains that it has faulty information and that it needs to rewire itself. 

Re: The biggest bias
  • 9/23/2016 12:20:00 PM

Excellent point, @Jamescon.

Re: The biggest bias
  • 9/23/2016 9:50:48 AM

@memetzga. Great points about how easy it is for even an experienced data pro to fall into these traps.

I'll add one other consideration that I haven't seen brought up in discussions like this. With the adoption of data-driven decision making and the popularity of analytics, we need to revisit data sources and the applications they feed periodically for a sanity check. It's easy to assume that a particular data feed that we have been using for a couple years is still valid. However, things happen in real life; a data source may start using different criteria and definitions, or -- shockingly -- our systems break. That sales trend that seems so important might be about customer behavior changing but it also might be a sign that one data source changed the rules last month.

Page 1 / 3   >   >>