Unlearning Our Old Data Ways


As I pointed out in my first post, my new favorite quote, compliments of Nate Silver's The Signal and the Noise: Why So Many Predictions Fail -- But Some Don't, is: "We face danger whenever information growth outpaces our understanding of how to process it."

To me, "understanding" is the key word in this statement.

I believe that one of the reasons Silver's forecasts are good is that he has a multidisciplinary skillset. He has an understanding of statistics, the data, and the domain in which the prediction problem lies. As an economist, he has been trained to apply mathematical models to real-world problems, combining his understanding of statistics and econometrics to solve a very data-rich problem. Sounds like fun, actually.

Really, Silver's methodology isn't all that groundbreaking. Everything he does involves tried-and-true techniques that only seem revolutionary because he is successfully applying them in a way new to the domain. I think back to an old colleague, let's call him Carl, and his methodology for doing an internal business forecast, which was to put numbers into a spreadsheet and produce what I would call -- in polite company -- a na´ve forecast. Poor Carl didn't know any better, and his results showed it. The forecast was awful. Carl wasn't a statistician or econometrician -- he was a finance guy. He knew finance.

When I helped Carl create a forecast that took into account seasonality, autocorrelation structure, structural changes, and so forth, you would have thought I had pulled out my wizard's staff and conjured a miracle. Carl was awestruck -- a lot like the people who have been following Silver's success. But I hadn't really done anything magical, and neither has Silver. We just applied existing techniques to a domain accustomed to doing things the old way.

To be successful, you have to understand the domain very well. You have to understand the data extremely well. You have to understand information technology at an expert level. You have to understand the tools in your toolbox. And you have to understand how to put all of this together in a creative way to solve the problem at hand. It takes experimentation, curiosity, and creativity. That's what I believe the term "data scientist" implies more than anything else.

Big-data is a multidisciplinary game. Success requires deep expertise in multiple disciplines as well as the creativity to solve problems in ways that haven't previously been done. That's one of the reasons there is such a shortage of people who can handle big-data effectively. We've been handling data in a certain way for so long that change is difficult, and our existing talent pipeline is still tooled for the old approach. Higher education is still churning out skillsets that are largely uni-disciplinary, and businesses are no different. I sometimes think that when I say the word "analytics" to most audiences, their brains translate the word into "business intelligence," with visions of OLAP cubes and KPI dashboards dancing in their heads. Repeat after me: text analytics, neural networks, nonlinear optimization, simulation, bootstrapping. Please, please don't show me another pretty BI presentation tool and ask me if it will meet my analytic needs. Please.

It's the newer businesses of the Internet Age that seem to be most effective in dealing with big-data, in part because they had to invent the new approach, and they had little to unlearn.

And speaking of unlearning, Silver's book is the first one I've encountered where most of the references use e-reader positions instead of page numbers. I guess I have to spring for the e-reader edition and adapt my research approach if I want to dig deeper. So, good luck to all of you old-schoolers out there. You've officially been left behind. Maybe you can find solace in your OLAP cubes.

And Silver, I'll be seeing you in the blogosphere in about four years, buddy. Keep tuning that model.

Mark Pitts, Data Scientist & Healthcare Executive

Mark Pitts is a data scientist and healthcare executive with more than 25 years of experience solving business problems with technology and analytics. He started programming at the age of 13 – writing his first program on paper because he didn't yet have a computer – and hasn't stopped since. Over the years, he's garnered advanced education and expertise in computing science and business domains, and has applied his multidisciplinary skillset in leading real-world implementations of enterprise resource planning, financial and business intelligence systems, and multimillion-dollar, greenfield development projects to solve enterprise-scale business challenges.

He ultimately progressed from the IT shop to the business, driving the financial performance of healthcare organizations in areas including managed care contracting, provider compensation, payment integrity, forecasting, clinical quality, medical billing, receivables management, and analytics. His innovative work has been recognized with a variety of awards, and his creations support benefits measured in billions of dollars.

In May 2013, Pitts will complete an additional graduate program at Texas A&M University, receiving a Master of Science in statistics with a dual emphasis in applied statistics and biostatistics. He undertook these studies with the recognition that advances in computing technology, the explosion of the electronically interconnected world, and advances in machine learning would combine to change the game, especially in healthcare. He has a passion for writing and public speaking, with a track record of highly rated appearances in a variety of venues, from business and executive conferences to technical and analytics conferences. He has been interviewed, quoted, and featured in a variety of print and online publications, and is currently developing a course designed to introduce business people to the power of data visualization and analytics to solve everyday business problems.

He is also developing a Website for advanced analytics and is incubating a book project that will gain more momentum post-graduation. Pitts is currently employed by a Fortune 25 company, and he lives in Minneapolis with his lovely (and patient) wife, their two teenagers, and four remarkably spoiled dogs. He is also still glowing – and is somewhat harder to live with – after Harvard Business Review declared that data scientist is the sexiest job of the 21st century. You can follow Pitts on Twitter @DatalyticSci.

Here at SGF: Lovin' San Fran & Talking Up HPA

Watch for me at SAS Global Forum, and join me for my 2:00 p.m. PT session on high-performance analytics.

Navy Uses Cellphone App to Fight... 'For Good'

With an Android app, cellphone data, and analytics, the Navy hopes to better understand, monitor, and react during a crisis like severe flooding, earthquakes, and disease outbreaks.


Re: understanding what you don't know
  • 12/9/2012 4:46:31 PM
NO RATINGS

" I'd argue that you have to know that you don't know everything going in. There's an exploratory nature to Big Data that many people don't understand. It's not a checklist. It's a journey."

 

@Philsimon   Well said and couldn't agree more.

Re: BI (or reporting) as a platform for deploying analytic insight
  • 12/9/2012 4:10:44 PM
NO RATINGS

Hi Doug, It's good to hear from you.  Great point.  That "one version of the truth" idea is one of the most pernicious beliefs going.  What's the census in your hospital today?  Well, it depends on things like when you measure it, which systems you're sourcing from, and, perhaps most importantly, why you are asking the question.  I understand why people don't want different reports showing different numbers for that census, but there could be good reasons why they are different.  Perhaps the reports were intended to answer different questions, or they were sourced from different points in the life cycle.

One of my favorite quotes from the world of statistics is from George Box:

"All models are wrong, but some are useful."

My slightly tongue-in-cheek corollary to that is:

"All reports are wrong, but some are useful."

Reports involve measures of real-world events, and all measures are subject to error.

Re: BI (or reporting) as a platform for deploying analytic insight
  • 12/8/2012 2:04:55 AM
NO RATINGS

Hey Mark et al:

The way I think of it is that (most of) B.I. is about delivering up WHAT THE ORGANIZATION ALREADY KNOWS IS IMPORTANT. The various B.I. architectures, tools and typically heavy IT processes that are familiar to us all were designed to do just that, quickly, efficiently, consistently, and repeatedly. If B.I. has a dominant (if misbegotten) motto, it's got to be "One version of the truth." Which is very telling, because a singular truth is only conceivably possible when looking at what's already happened. *

In contrast, (not-B.I.) analytics is primarily about DISCOVERING WHAT IS NOT KNOWN, or projecting what could plausibly happen in the future. There can be no singular truth about the future, too many things can happen. So much depends on what the butterflies in Brazil decide to do. Or a black swan.

So B.I. and traditional reporting allow an organization to manage reactively to what has already happened. And the need to do that will probably never go away. But exploratory/predictive analytics help the organization discover new things, see into the future, and manage more proactively in anticipation of things that are likely to happen. 

Mark's reference to a "data ecosystem" is a useful, evocative phrase. I'm going to try to remember that.

 

(* Just for fun, I challenge proponents of "one version of the truth" to explain why we have 1+19,999 different books describing and explaining the life of Abraham Lincoln.)

 

 

Re: BI (or reporting) as a platform for deploying analytic insight
  • 12/6/2012 9:30:26 PM
NO RATINGS

@David.Pope - Great thoughts.  I think BI tools are a critical component of the big data ecosystem, and making the analytic result organic to the business process and easily consumable are a key part of the analytics value chain.  What I'm trying to do is broaden the horizons of those who don't understand there is something beyond star schemas, SQL, and OLAP cubes.

Re: understanding what you don't know
  • 12/6/2012 9:20:40 PM
NO RATINGS

@philsimon - Thanks for your comment - it's good to hear from you.  I couldn't agree more regarding the exploratory nature of analytics.  The two adjectives I use most often are "exploratory" and "iterative."  You don't have to know everything going in, but you do have to be able to learn quickly and "connect-the-dots" across previous experience and other disciplines.

BI (or reporting) as a platform for deploying analytic insight
  • 12/6/2012 12:02:01 PM
NO RATINGS

One of the main difficulties we all seem to continue to encounter is related to this mistaking BI and Analytics as being the same thing (thereby diluting the true business value derived when analytics really are used). I believe we can all agree that to be impactful analytic insight must be deployed into operation systems. From a technical perspective data scientists may rightly assume I am talking about how to run scoring (or the end result of developing a predictive model) in different systems/platforms etc... , however it has recently become more and more apparent to me that deploying analytics from more of a business perspective is in essence having the analytical based insight show up and be easily understood by others across an organization. This is where the problem comes in, because for the insight to show up and be understood it has to be in a "report" or in other words BI. This is where having to
"sell" analytic value becomes very important, because most end consumers associate the value of analytics in the BI based report they receive. The best way I believe to show someone the difference would be to show someone a report with true analytics baked in and then the same exact report with the analytical based insight REMOVED, then ask them which gives more value. In the case Mark mentioned regarding forecasting the problem is a bit more difficult because the reports may look exactly the same from a formatting perspective, they may both have numbers and then graphs based on those numbers, it's just the numbers and graphs based on analytics provide a more accurate end result.

understanding what you don't know
  • 12/6/2012 7:08:46 AM
NO RATINGS

Good stuff, Mark.


I like this the most:

 

To be successful, you have to understand the domain very well. You have to understand the data extremely well. You have to understand information technology at an expert level. You have to understand the tools in your toolbox. And you have to understand how to put all of this together in a creative way to solve the problem at hand. It takes experimentation, curiosity, and creativity. That's what I believe the term "data scientist" implies more than anything else.

 

I'd argue that you have to know that you don't know everything going in. There's an exploratory nature to Big Data that many people don't understand. It's not a checklist. It's a journey.

Re: The old and the new
  • 12/5/2012 6:10:56 PM
NO RATINGS

@BethSchulz - Thanks for your comment.  BI still has a place in the ecosystem, even for me.  The point I'm trying to make is that we have more than just BI in our toolbox these days, but many people I encounter limit their thinking to BI solutions.  When I say analytics, I'm referring to BI but also to much, much more.

The old and the new
  • 12/5/2012 9:50:54 AM
NO RATINGS

Mark, interesting points you raise here. Let me start with this one: "I sometimes think that when I say the word "analytics" to most audiences, their brains translate the word into "business intelligence," with visions of OLAP cubes and KPI dashboards dancing in their heads. Repeat after me: text analytics, neural networks, nonlinear optimization, simulation, bootstrapping. Please, please don't show me another pretty BI presentation tool and ask me if it will meet my analytic needs. Please." Is this to say at your level BI isn't for you or that at any level BI isn't for a company any longer -- that it must advance its thinking? 

Silver
  • 12/5/2012 8:59:43 AM
NO RATINGS

Nate Silver has already inspired a "drunk Nate Silver" meme on Twitter, like this tweet from @jfruh: "Drunk Nate Silver waits 20 minutes for the G train, nods silently when it arrives, walks out of the station."

When asked about it, the real Silver said:

NATE SILVER: If only people knew the real drunk Nate Silver. I'm not so dark, necessarily. I just get into stupid arguments about sports with my friends. It's one thing when you have yourself, but it's another thing when you start to symbolize a movement and you don't really have control over it in a certain sense.

Page 1 / 2   >   >>
INFORMATION RESOURCES
ANALYTICS IN ACTION
CARTERTOONS
VIEW ALL +
QUICK POLL
VIEW ALL +