Whether you consider yourself a business intelligence expert, a data miner, or a predictive modeler, you have to be smart about how you think about your discipline, treat your data, and work with the business. You might not hear it expressed as such, but common sense is a must.
Set aside your common sense when working with data, and watch out. "It can really bite you," said Dean Abbott, internationally recognized predictive analytics expert and author of the new book Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst. "The great thing about predictive modeling and algorithms and machine learning is the models are induced from the data. Of course... the biggest weakness of machine learning techniques is that the models are induced from the data," he told listeners of this week's A2 Radio episode, "The Art of Predictive Modeling" (register now and listen on demand).
This is problematic if the data has problems or is biased in some way, he added. That's because "the model will gleefully go and build whatever it finds in the data, which may not be what you expect or what you really want."
During the radio broadcast, Abbott shared a wealth of sensible advice for predictive modelers. Here are five of his tips:
Know your science, but embrace the art of predictive modeling. As a predictive modeler, you're going to have to have algorithm chops. Your linear regressions, K-Mean clusters, and neural networks aren't going to build themselves, after all. But how you do your analysis may be quite different from the way somebody else does -- and that's OK, Abbott said. "People attack problems in different ways," which gets to the art of the discipline, he said. And, unless you're doing leading-edge work, the science is only going to take you and your model so far. "You [might] build the coolest random forest ever, only to find out it's completely useless because it's not addressing the business objective." In other words, aligning a model to business objective is an art.
Don't dis domain expertise. Along with big data's arrival on the analytics scene is the belief of some in the data community that "just as long as we have enough data we don't need domain experts any more." That's just not true, Abbott said. Data can be a tricky devil. "We don't understand all the ways that data can deceive us, and if we just rely on the data alone, we can be fooled into thinking we have something that's good when we really don't."
Get IT buy-in from the get-go. Just as much as you need to partner with those keepers of the business knowledge mentioned above, you need to involve IT -- or whoever is holding the data -- in your predicting modeling projects. "You need to know where the data is stored, how one can access that data, and that the data means. Without IT buy-in, many predictive models fail," Abbott said. It happens, he added. "You discover all the data you need to build the models, but the IT fiefdom has erected a wall so high that you can't climb over it and you have to change what you do with predictive modeling or you'll never be able to deploy the model."
Don't be married to your model. Abbott said he smiles when he's asked about iterative modeling because it gets exactly to what he finds to be quite useful when working with predictive models. "And that is getting the model out the door for assessment as quickly as possible, even if it doesn't have all the bells and whistles you'd like in it." Building the model oftentimes isn't particularly difficult, after all. "So if you can get to the end and show this is what the model is seeing and get some quick feedback, you have time to correct all the misconceptions along the way. It's almost always an iterative process, where the first thing that comes out of a decision tree or neural net or whatever the algorithm is isn't what you'll be using ultimately." So, build the model, show stakeholders what it's finding, answer all their questions… and make your adjustments, he advised.
Remember the back-end. As you work on your model, make sure you know how it's going to be deployed. Will it be deployed in software or will you need to do an ad hoc deployment? Will the model need to fit into some operational system? If the latter is the case, then you need to know what form the model needs to take, Abbott said. Your modeling algorithm may be easy enough to encode using C, Java, SQL, or the Predictive Model Markup Language (PMML), but not so with all the data prep you've done, he warned. So, if you've monkeyed around with the data, filling in missing values or creating derived attributes for example, you're going to have to redo all those computations. So it behooves you to keep good notes on that data prep. "So if you're pulling data from a data warehouse and then bringing it into your data mining environment and doing all your data prep, keep note of what you're doing and then as much as possible push that back up to the database so that when you're scoring models later you're pulling data from a modeling table that's already got this data prep built in and then push that out to your scoring, or whatever."
For more of Abbott's advice, tune into the show on demand and read through the Q&A on our message board below the player. And share your own common sense advice for working with predictive models below.
What a coincidence! I'll read that shortly. The iterative process may apply not only to modeling for customers; I can see it working for competitors, vendors, and even internal production processes, where occasional changes will dramatically affect the previous models.
@magneticnorth, funny you should mention the value of using iterative modeling in finance. We just posted on that very topic! If you haven't yet had a chance to read the post, you might like to! See: Model & Remodel to Find Profitable Customers
The tips brought home the idea of getting IT aboard the project. As stated jumping over the high fences sometimes will bring the process to a halt. Getting all those folks aboard will certainly smooth the way for building models and implementing them successfully.
Right, @Jim. Abbott clearly knows his stuff, but it's refreshing to hear how much he values the opportunity to learn from others, too -- sitting over their shoulders and watching how they'd tackle a problem versus how he would himself. Not everybody at his level is so willing to set egos aside and open themselves to new approaches and ideas.
One of the things that Abbott brought to the radio show was the ability to look at a variety of factors (and people) that play a role in predictive analytics. On one hand he could delve into the technical aspects of building a model -- and the technical chops that are required on the team -- and then he could recognize the value of domain expertise.