Math vs. Data: Exploring the Big-Data Buzz

The emphasis on big-data and big-data analytics has focused the predictive analytics discussion on the overwhelming importance of the data itself, rather than the mathematics.

This is not to say that data was never considered important in building predictive analytics solutions. But prior to the big-data hype, much of the emphasis concentrated on mathematics and the need to identify the next breakthrough technology. Putting on our marketing science hats, practitioners would test these new techniques/technologies and determine whether or not they yielded incremental value.

In most cases, these newer techniques/technologies yielded minimal value over and above the traditional multivariate techniques such as logistic regression and multiple regression. But why is this the case if newer techniques and technologies -- in theory -- can produce powerful results, as seen in academic research and other non-business settings?

Business scenarios are different than those in academia and other settings due to the data environment in which the so-called random error component in many cases is quite large. Certainly, this random component is much larger when compared to the more esoteric and pristine data environments that exist within research academia.

In the practical world of business, our ability to explain the actual behavior of our predictive variable is quite small. The more traditional and simple multivariate techniques work quite well under these scenarios, with limited analytics potential delivering acceptable results, while the more advanced techniques yield minimal improvement in performance.

This is best demonstrated by looking at the R2 of a multiple regression equation where the R2 values are well below 10 percent, implying that any predictive analytics solution is only able to explain 10 percent or less of the targeted behavior. With so much unexplained variation, the real risk in employing newer techniques and technologies is that some of this truly unexplained variation now becomes explained.

Overstatement of results is the consequence of this action, with disappointing results being the outcome of an implementation.

Yet, our real opportunity to improve results resides with the data itself. Altering data inputs, or creating new variables, can significantly improve performance.

This is evident by looking at how model performance and R2 will vary significantly by the data inputs or variables used in the model. The notion of putting the right eggs in your basket is the key driver in building effective solutions. This fact results in practitioners spending more time on creating the analytical file and the potential data inputs into any model.

It's not unusual that 80 to 90 percent of the practitioner’s time is spent in this area working on the data, with the remaining time conducting the more advanced routines.

Most experienced practitioners welcome the discussion and hype surrounding big-data. It has reinforced the discipline of data, and more importantly, the process of creating the right data environment. New terms, such as data science, now profess to the importance of data within the predictive analytics process. Describing data as a science implies that practitioners undertake a rigorous and methodological approach in dealing with data.

But from a practitioner’s standpoint, is any of this really new? Data as a discipline has always represented the bedrock foundation in the development of any predictive analytics solution, after all. What do you think?

Richard Boire,

Richard Boire's experience in database marketing and predictive analytics dates back to 1983, when he received an MBA from Concordia University in finance and statistics. His initial experience at organizations such as Reader's Digest and American Express allowed him to become a pioneer in the application of predictive modeling technology for all direct marketing programs. This extended to the introduction of models that targeted the acquisition of new customers based on return on investment and, ultimately, customer profitability. With this experience, he formed his own consulting company in 1994. Now called the Boire Filler Group, the firm is a Canadian leader in offering analytical and database services to companies seeking solutions to their predictive analytics or database marketing challenges. Boire is a recognized authority on predictive analytics and is among the top five experts in this field in Canada, with expertise and knowledge that is difficult, if not impossible, to replicate. He gives seminars on segmentation and predictive analytics for such organizations as the Canadian Marketing Association (CMA), Direct Marketing News, Direct Marketing Association Toronto, and the Association for Advanced Relationship Marketing. His articles have appeared in Canadian publications including Direct Marketing News, Strategy Magazine, and Marketing Magazine.Supplementing his written materials, he has spoken internationally at such conferences as the Database Marketing Conference, Ukraine Direct Marketing Conference, and Predictive Analytics World. He has pioneered the training and development seminar at UNI Strategic entitled, "Using Predictive Analytics to Compete in the New Economy." Boire has taught applied statistics, data mining, and database marketing at a variety of Canadian institutions, including University of Toronto, Concordia University, George Brown College, and Centennial College. He chairs the CMA's customer insight and analytics committee and sits on the CMA's board of directors. He has chaired numerous full-day conferences on behalf of the CMA, and has co-authored whitepapers on the following topics: "Best Practices in Data Mining" and "Customer Profitability: The State of Evolution Among Canadian Companies."

Keeping Fiction Out of Big-Data Phenomenon

Just because the nature of data is changing doesn't mean analytical best-practices need to change, too.

Batter Up! Why Domain Knowledge Matters in Sports Analytics

Applying domain knowledge to traditional baseball metrics made the difference for the Oakland A's -- and started a trend.

Re: Not so different
  • 1/9/2013 7:51:05 PM

As mentioned, research in labs and academia are pristine environments.  In business, it is much more messy.  A change in x will cause a change in y, but that might cause an unknown change in A, B,C.   In real life, individuals are not pool balls and may not go in the direction you hit them.  However, we are getting better at predicting.  A lighthouse doesn't get rid of the fog, but it is sure better than no lighthouse at all.  I think this is why five years is considered a long-term business plan here in the U.S.

Re: Not so different
  • 1/9/2013 11:35:01 AM

Phil, what would you consider an ongoing issue?

Re: Not so different
  • 1/9/2013 10:32:57 AM

The types of data are new. So are the amounts. Ditto the velocity. But many of the tools and issues are pretty similar to those previous decades.

Re: Not so different
  • 1/9/2013 10:03:27 AM

I would agree with you that perhaps the new terms are more consumer-friendly. But I also think that definitions and descriptions regarding predictive analytics discipline have increased significantly due to increased public understanding of the value of analytics. The industry of predictive analytics is no longer the new frontier in business which it was 15 years ago. It is now more mainstream where consultants are attempting to brand the discipline with new terms such as data scientist.  Is that really different than a data miner which is what we called ourselves 15-20 years ago.         

Not so different
  • 1/9/2013 9:18:27 AM

Hi Richard. I think one the benefits of framing the predictive analytics role in "consumer friendly" terms like big-data and data science makes the whole discipline more approachable for non-practitioners even if behind the scenes things haven't changed much. That's important in overcoming business resistance to using analytics-based decision making.