Ask & Ask Again: Questions Matter in Modeling

If you want to make a name for yourself in the sexy data science field, you best get ready to ask a lot of questions. Without them, your modeling will be lackluster at best.

Here's how Mike Swinson, executive vice president at TrueCar, an online car-buying platform, put it during a recent presentation at IE's Predictive Analytics Innovation Summit in Chicago: "The role of data scientists and predictive modelers is to ask the insightful questions, to really hone in on the problem structures and leverage scalable data architectures to the greatest effect." Failing to take the business of asking questions seriously will make the task of extracting useful intelligence from raw data rather difficult, he added.

Let's set the science of data aside for a moment and think instead about the art of it. "This process of asking the right questions and really driving into and framing a problem structure is really what the art of modeling comes down to," Swinson suggested.

And yet, so many analytics teams -- across company and industry type -- shortchange the process, Swinson said. They'll take raw data, pump it into their predictive models, and expect great intelligence to pop out the other end. Voilà!

Far better, as Swinson has shown at TrueCar, is following a four-step process, as shown below.

Moving from one stage to the next takes asking the right questions, he said.

To turn that raw data into useful, contextual information that really frames the problem structure takes asking the right questions. To take that information and move to actionable information again takes asking the right questions so you can structure this in terms of a predictive model that you can actually use to drive results. And then to go from that third stage to an actual strategy again takes actual asking of questions to be able to structure your strategies and make use of your various models... for effective use in your business decisions.

Swinson provided a couple of examples of how to ask the right questions, including how this idea comes into play for use with TrueCar's own dealer scoring algorithm. Via the algorithm, TrueCar aims to present the best dealers, in the optimum order, to consumers when they input their car-buying criteria. "It's a problem not too dissimilar to what you'd face, say, if you're at Google and you're in charge of designing the search engine algorithm."

Starting with Step 1, TrueCar has its raw data to consider -- bits of user-entered information such as location and vehicle specifications, prices and other information from dealers, and third-party consumer data, for example. To move through the process, TrueCar has to ask itself about the consumer's behavior so it can better understand from which dealers they'd most want to buy. Pricing, location, and selection are primary factors, but each needs digging into for transitioning raw data into intelligent information.

On pricing, for example, TrueCar might also have to factor in pricing relative to other dealers and to the manufacturer's suggested retail price. On location, radial distance or proximity between a buyer and dealer, based on ZIP code, probably isn't telling enough. More useful might be driving distance or drive time, and even those could need refinement. "A two-hour drive time in Chicago has a different meaning than a two-hour drive time in Billings, Montana," noted Swinson, adding, "All these things need to be factored into the way we're constructing that variable."

But that's not enough if the model is to zero in each buyer as an individual and not as a composite. "DG might be product sensitive and needs to have that silver car with heated leather seats. Roy, on the other hand, might be very price sensitive. He doesn't really care about the specific product. He doesn't mind driving a little bit further as long as he can get the cheapest, best deal he can get with similar specifications."

As TrueCar peels back the layers, and leverages all these myriad factors in its models, it's been able to "achieve massive increases in profitability and customer satisfaction," Swinson said. As an example, he cited TrueCar's Net Promotor Score, which, at greater than 70 percent, is one of the highest among Internet companies.

It all comes down to this basic reality: Models themselves aren't inherently intelligent.

They're really searching for correlations. They have no knowledge whatsoever of causality, and so this is where we as data scientists and predictive modelers can really inform the decision-making process of the model. By asking the right questions, by structuring not only the individual variables but structuring the model itself, we can really drive to an understanding of causality and really isolate the factors that specifically lead to the effects we're looking for.

Do you ask enough questions during your modeling process? Share below.

Beth Schultz, Editor in Chief

Beth Schultz has more than two decades of experience as an IT writer and editor.  Most recently, she brought her expertise to bear writing thought-provoking editorial and marketing materials on a variety of technology topics for leading IT publications and industry players.  Previously, she oversaw multimedia content development, writing and editing for special feature packages at Network World. In particular, she focused on advanced IT technology and its impact on business users and in so doing became a thought leader on the revolutionary changes remaking the corporate datacenter and enterprise IT architecture. Beth has a keen ability to identify business and technology trends, developing expertise through in-depth analysis and early adopter case studies. Over the years, she has earned more than a dozen national and regional editorial excellence awards for special issues from American Business Media, American Society of Business Press Editors,, and others.

Midmarket Companies: Bring on the Big Data

The "big" in big data is no reflection of the size of the organization embracing its potential.

Push Yourself to New Analytical Discoveries

Take inspiration from Christopher Columbus as you pursue your analytical journeys.

Re: Asking the right questions
  • 11/30/2012 6:26:35 PM

@Lyndon_Henry I agree, results should accept and help to narrow ideas from being "too broad" Numerous issues can be identified through these surveys. Work with the most problematic issues such as security challenges, internal problems, etc al. It is important to be able to prioritize, act on survey results with needed improvements.

Re: Asking the right questions
  • 11/29/2012 5:27:18 PM


Beth asks are you suggesting surveying business users/managers about what business problem they're hoping the analytics will help them address? Or are you speaking more generally of surveys, for customer satisfaction, etc.?


Well, certainly, surveying the potential users (managers, administrators, planners, etc.) of the results would be a terrific idea.

But I had in general public surveys more in mind — I've had more experience with those.

In either case, you need to try to define the key issues as narrowly as possible.  Otherwise, I think you get survey results that are so broad and mushy they are close to useless (except lots of planners and dcisionmakers like to get results that are so broad and mushy they can interpret them in ways to bolster their own preferred goals).

And if this is supposed to feed into the development of a model, you could really end up with some junk and perhaps a mess.


Re: Asking the right questions
  • 11/29/2012 3:22:40 PM

@Lyndon_Henry -- so are you suggesting surveying business users/managers about what business problem they're hoping the analytics will help them address? Or are you speaking more generally of surveys, for customer satisfaction, etc.?

Asking the right questions
  • 11/28/2012 11:05:41 PM


Beth's article asks

Do you ask enough questions during your modeling process? 


Where I've found asking (formulating) the right questions to be important has been in surveys.  Not that they've all led to analytical models, but they have influenced decisonmaking, and some have fed into the modeling process.  Learning to ask the right questions is also helpful in learning how to identify the critical variables that are needed to make good decisions and build more effective models.


Re: Answer the question, question the answers
  • 11/27/2012 1:25:02 PM

Callmebob, very true.  I like how this seems to really force introspection into the value of what is being modeled. Modeling data that may take too long of an arc to answer relative to the business need could benefit from this structure.

Answer the question, question the answers
  • 11/27/2012 1:15:06 PM

Here's where I think that the data wizards need to have a bit of a marketing mind. In fact, they should work in consortium to create models that frame marketing's questions that answer their tasks to identify target markets, market research, product development, marketing mix, and monitoring results.