If you want to make a name for yourself in the sexy data science field, you best get ready to ask a lot of questions. Without them, your modeling will be lackluster at best.
Here's how Mike Swinson, executive vice president at TrueCar, an online car-buying platform, put it during a recent presentation at IE's Predictive Analytics Innovation Summit in Chicago: "The role of data scientists and predictive modelers is to ask the insightful questions, to really hone in on the problem structures and leverage scalable data architectures to the greatest effect." Failing to take the business of asking questions seriously will make the task of extracting useful intelligence from raw data rather difficult, he added.
Let's set the science of data aside for a moment and think instead about the art of it. "This process of asking the right questions and really driving into and framing a problem structure is really what the art of modeling comes down to," Swinson suggested.
And yet, so many analytics teams -- across company and industry type -- shortchange the process, Swinson said. They'll take raw data, pump it into their predictive models, and expect great intelligence to pop out the other end. Voilà!
Far better, as Swinson has shown at TrueCar, is following a four-step process, as shown below.
Moving from one stage to the next takes asking the right questions, he said.
To turn that raw data into useful, contextual information that really frames the problem structure takes asking the right questions. To take that information and move to actionable information again takes asking the right questions so you can structure this in terms of a predictive model that you can actually use to drive results. And then to go from that third stage to an actual strategy again takes actual asking of questions to be able to structure your strategies and make use of your various models... for effective use in your business decisions.
Swinson provided a couple of examples of how to ask the right questions, including how this idea comes into play for use with TrueCar's own dealer scoring algorithm. Via the algorithm, TrueCar aims to present the best dealers, in the optimum order, to consumers when they input their car-buying criteria. "It's a problem not too dissimilar to what you'd face, say, if you're at Google and you're in charge of designing the search engine algorithm."
Starting with Step 1, TrueCar has its raw data to consider -- bits of user-entered information such as location and vehicle specifications, prices and other information from dealers, and third-party consumer data, for example. To move through the process, TrueCar has to ask itself about the consumer's behavior so it can better understand from which dealers they'd most want to buy. Pricing, location, and selection are primary factors, but each needs digging into for transitioning raw data into intelligent information.
On pricing, for example, TrueCar might also have to factor in pricing relative to other dealers and to the manufacturer's suggested retail price. On location, radial distance or proximity between a buyer and dealer, based on ZIP code, probably isn't telling enough. More useful might be driving distance or drive time, and even those could need refinement. "A two-hour drive time in Chicago has a different meaning than a two-hour drive time in Billings, Montana," noted Swinson, adding, "All these things need to be factored into the way we're constructing that variable."
But that's not enough if the model is to zero in each buyer as an individual and not as a composite. "DG might be product sensitive and needs to have that silver car with heated leather seats. Roy, on the other hand, might be very price sensitive. He doesn't really care about the specific product. He doesn't mind driving a little bit further as long as he can get the cheapest, best deal he can get with similar specifications."
As TrueCar peels back the layers, and leverages all these myriad factors in its models, it's been able to "achieve massive increases in profitability and customer satisfaction," Swinson said. As an example, he cited TrueCar's Net Promotor Score, which, at greater than 70 percent, is one of the highest among Internet companies.
It all comes down to this basic reality: Models themselves aren't inherently intelligent.
They're really searching for correlations. They have no knowledge whatsoever of causality, and so this is where we as data scientists and predictive modelers can really inform the decision-making process of the model. By asking the right questions, by structuring not only the individual variables but structuring the model itself, we can really drive to an understanding of causality and really isolate the factors that specifically lead to the effects we're looking for.
Do you ask enough questions during your modeling process? Share below.