Set aside your common sense when working with data, and watch out. "It can really bite you," said Dean Abbott, internationally recognized predictive analytics expert and author of the new book Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst. "The great thing about predictive modeling and algorithms and machine learning is the models are induced from the data. Of course... the biggest weakness of machine learning techniques is that the models are induced from the data," he told listeners of this week's A2 Radio episode, "The Art of Predictive Modeling" (register now and listen on demand).
During the radio broadcast, Abbott shared a wealth of sensible advice for predictive modelers. Here are five of his tips:
- Know your science, but embrace the art of predictive modeling. As a predictive modeler, you're going to have to have algorithm chops. Your linear regressions, K-Mean clusters, and neural networks aren't going to build themselves, after all. But how you do your analysis may be quite different from the way somebody else does -- and that's OK, Abbott said. "People attack problems in different ways," which gets to the art of the discipline, he said. And, unless you're doing leading-edge work, the science is only going to take you and your model so far. "You [might] build the coolest random forest ever, only to find out it's completely useless because it's not addressing the business objective." In other words, aligning a model to business objective is an art.
- Don't dis domain expertise. Along with big data's arrival on the analytics scene is the belief of some in the data community that "just as long as we have enough data we don't need domain experts any more." That's just not true, Abbott said. Data can be a tricky devil. "We don't understand all the ways that data can deceive us, and if we just rely on the data alone, we can be fooled into thinking we have something that's good when we really don't."
- Get IT buy-in from the get-go. Just as much as you need to partner with those keepers of the business knowledge mentioned above, you need to involve IT -- or whoever is holding the data -- in your predicting modeling projects. "You need to know where the data is stored, how one can access that data, and that the data means. Without IT buy-in, many predictive models fail," Abbott said. It happens, he added. "You discover all the data you need to build the models, but the IT fiefdom has erected a wall so high that you can't climb over it and you have to change what you do with predictive modeling or you'll never be able to deploy the model."
- Don't be married to your model. Abbott said he smiles when he's asked about iterative modeling because it gets exactly to what he finds to be quite useful when working with predictive models. "And that is getting the model out the door for assessment as quickly as possible, even if it doesn't have all the bells and whistles you'd like in it." Building the model oftentimes isn't particularly difficult, after all. "So if you can get to the end and show this is what the model is seeing and get some quick feedback, you have time to correct all the misconceptions along the way. It's almost always an iterative process, where the first thing that comes out of a decision tree or neural net or whatever the algorithm is isn't what you'll be using ultimately." So, build the model, show stakeholders what it's finding, answer all their questions… and make your adjustments, he advised.
- Remember the back-end. As you work on your model, make sure you know how it's going to be deployed. Will it be deployed in software or will you need to do an ad hoc deployment? Will the model need to fit into some operational system? If the latter is the case, then you need to know what form the model needs to take, Abbott said. Your modeling algorithm may be easy enough to encode using C, Java, SQL, or the Predictive Model Markup Language (PMML), but not so with all the data prep you've done, he warned. So, if you've monkeyed around with the data, filling in missing values or creating derived attributes for example, you're going to have to redo all those computations. So it behooves you to keep good notes on that data prep. "So if you're pulling data from a data warehouse and then bringing it into your data mining environment and doing all your data prep, keep note of what you're doing and then as much as possible push that back up to the database so that when you're scoring models later you're pulling data from a modeling table that's already got this data prep built in and then push that out to your scoring, or whatever."
For more of Abbott's advice, tune into the show on demand and read through the Q&A on our message board below the player. And share your own common sense advice for working with predictive models below.
— Beth Schultz, , Editor in Chief, AllAnalytics.comRelated posts: