Data Scientists Will Not Be Replaced by Automation

If you don't have a good definition of what you are debating about, you may be talking past your opponent. Even more so when you are talking about a profession that has emerged only in the last few years and its definition may be somewhat fuzzy in the minds of debaters and readers alike.

While there has been no shortage of commentary about what makes data scientists tick (see my brief history of data science), a new in-depth discussion of how they work was just published by the Harvard Business Review, just in time to inform this debate.

In "Data Scientist: The Sexiest Job of the 21st Century," business and technology consultant Tom Davenport and D.J. Patil, a data scientist in residence at Greylock Partners, tell us about Jonathan Goldman, the data scientist who came up with the "people you may know" feature on LinkedIn: "He began forming theories, testing hunches, and finding patterns that allowed him to predict whose networks a given profile would land in." Based on this and other observations of data scientists, Davenport and Patil generalize about how they work:

What data scientists do is make discoveries while swimming in data... [their] dominant trait is intense curiosity -- a desire to go beneath the surface of a problem, find the questions at its heart, and distill them into a very clear set of hypotheses that can be tested. This often entails the associative thinking that characterizes the most creative scientists in any field.

Perhaps it's becoming clear that the word 'scientist' fits this emerging role... their greatest opportunity to add value is not in creating reports or presentations for senior executives but in innovating with customer-facing products and processes.

Davenport and Patil don't provide a concise definition of a data scientist, but for the purpose of this debate, let me offer a working definition: A data scientist is an engineer who employs the scientific method and applies data-discovery tools to find new insights in data. The scientific method -- the formulation of a hypothesis, the testing, the careful design of experiments, the verification by others -- comes from their knowledge of and training in statistics. The application (and tweaking) of tools comes from their engineering or, more specifically, computer science and programming background. The best data scientists are product-and-process innovators and, sometimes, developers of new data-discovery tools.

You need humans to do this kind of work. Data science as a discipline is in its infancy and as it evolves we'll no doubt see some activity done manually today automated in the future (e.g., data cleansing). But a person -- with the right qualifications -- will always have to be there and tell the machine what to do.

Even more important, making a statement about automation (of any activity and profession) in the future assumes that what we do today will stay the same forever. This is obviously a ridiculous assumption, nowhere more ridiculous than in the domain in question, the application of computer technology. While there was excited talk about how automation will replace software engineers, some members of this soon-to-be endangered species went on to develop Hadoop and other foundation technologies and tools of the big-data ecosystem. There will always be new challenges and it will always be humans, not machines, identifying the need and building the solutions.

The need for communications -- i.e., the ability to explain to clueless business executives what the data means -- is another argument sometimes voiced in support of the notion that tools can't replace data scientists. I tend to be a bit more generous toward business executives (having been one in the past) so I would suggest that the discoveries expected of data scientists don't only happen when the data scientist is communing with the data.

New insights often emerge when data scientists and business executives (or anyone else with a strong domain expertise) discuss and brainstorm what questions to ask, what the results of the analysis actually mean, and what the next iteration should be. This is what lies behind the requirement for people skills or business acumen often included in the basket of skills expected of a data scientist. The ability to communicate the results of the analysis is indeed important, but it can be replaced, at least to some extent, by good data visualization tools. But brainstorming with a machine? I don't think so.

Finally, whenever we discuss humans vs. machines, advocates for machines usually don't fail to mention the human follies and foibles that are obviously absent from inanimate matter. In our context, the famous dictum from Marissa Mayer, then a Google search executive and now Yahoo CEO, comes to mind: "Data is apolitical." At her former employer, Google (and one would assume now at Yahoo), data, not politics, drives all decisions. Really? As Michael Schrage, a research fellow at MIT Sloan School's Center for Digital Business, noted in an HBR blog, "some data are apparently more apolitical than others: the closure of Google Labs, for example, as well as its $12.5 billion purchase of Motorola Mobility are likely not models of data-driven 'best-practice.' "

The bigger point is that data could be "political" because people use it and people have agendas. They can ask the wrong questions, design the wrong tests, and make the wrong interpretations to fit their agendas and what they want the answer to be. The answer to human biases is not to replace humans with machines or data (which anyway could be guided by human biases). The answer is the scientific method. This is how and why science has made progress in the last 350 years, insulating scientific inquiry from human bias.

Which is why the new profession of data scientists promises to inject -- to some extent -- new objectivity into the decision-making process whenever it is possible to conduct an experiment and draw conclusions from the data. But the formulation of the hypotheses, the design of the experiments, and the interpretation of the results will not be done by machines. I'm certain machines will also fail to invent new products and processes.

To what extent do you see automation playing out in the world of big-data and data science? Read our Point post, and share your opinions on this debate on the message boards.

Point / Counterpoint, Managing Partner, gPress

Counterpoint: Train for Data Visualization Skills

Chances are you've already got good data visualization experts on staff, even if you don't know it yet.

Customer-Centric Banking Analytics Scares Me

Banks have tons of customer data at their disposal; unfortunately not all will use it scrupulously.

Re: Amen!!
  • 10/3/2012 12:39:52 PM

@Mr_BDMartin201 Great observations, I specifically like "humans still have a shelf life"...  To your question, I attended the first day of PAW Boston and in the sessions I attended there was no talk of automation. One presentation, however, by Lattice Engines, described how they improve sales reps' productivity by providing a tool that does some of the research and analysis work for them, e.g., delivering potential new leads or information about prospects. Which got me thinking that there are 2 aspects of "automation": One is when manual is completely replaced by software. Another is when the software tool augments what humans do and sometimes perform tasks that humans simply don't have the time to perform. 

Re: Amen!!
  • 10/3/2012 12:04:46 PM

@Mr_BDMartin201 -- financial services is often ahead of the technology curve, I wonder why that's not the case here. Your thoughts?


As for PAW in Boston, I did not attend myself -- so can't help you there!

Re: Amen!!
  • 10/3/2012 10:20:24 AM


Thank you for responding to me.  I certainly hope so that we get greater amounts of automation!! I work on the process side and I do automation, but in Financial Services, there is still too little automation compared with tangible goods like chemicals and consumer goods.  Sometimes the pace of automation can be driven by the annual budget.  Great point that the data scientist's role will change also, not just the analytics.

I want to ask you or anyone on the post if they attended the Predictive Analytics World in Boston?  Did they see anything that addressed this analytics automation?  


Re: Amen!!
  • 10/3/2012 7:45:26 AM

@Mr_BDMartin201, first off, thanks for jumping into the conversation here. I agree with you that analytics will continuously change, but to me that continuous change means increasing amounts of automation -- aimed at addressing the perpetual goals of optimization and greater agility. Who's to say that as the tools advance the role of data scientist doesn't change along with that, to the point, as Perlowitz suggests in the counterpoint piece, that they're focused on the "preparation, management, and integration of in situ data, and managing data provenance."

Re: Amen!!
  • 10/3/2012 7:41:26 AM

@Mr_BDMartin201, first off, thanks for jumping into the conversation here. While I agree that analytics is ever changing, I believe as part of its evolution it becomes more and more automated. Who's to say, as Perlowitz suggests in the counterpoint piece, that as the analytics tools themselves advance the data scientists devolve to a role in which they're focused on the "preparation, management, and integration of in situ data, and managing data provenance?"

  • 10/2/2012 3:14:08 PM

Data scientists will set up partial automation programs so that analytics can be output from their experiments and from collecting data on existing processes.  However, new analytics are coming out all the time, and I don't see this happening at a 100%.

So analytics, like humans, have a shelf life, and it will keep changing.


Data Artists
  • 9/30/2012 8:36:57 AM

Here is a wonderful blog post and accompanying Ted talk video about "data artist" Jer Thorp.  His work is political in the sense that a certain ethos pervades his work, and I suppose it's possible to disagree with that ethos and the publications for which he writes.

But he inspires me because a mind like his couldn't be replicated by a computer in a million, billion, trillion years!

Re: Judgement
  • 9/30/2012 2:05:28 AM

Re: "I think data scientists will not be replaced by automation in the same way that autopilots haven't replaced human pilots."

Well, let's see what happens with driverless cars.

Re: Data, for me, has always been political
  • 9/30/2012 2:04:00 AM

Good points, magneticnorth, but I think it's a matter of the precision of the words and their definitions.  Data are apolitical; analytics are not.  Depending upon whom you ask, X may mean Y, or it may mean Z.

Data in politicking
  • 9/30/2012 2:01:40 AM

Data may be apolitical, but politics is not "adatal."  There is certainly an ROI behind political decisions and political consequences; it's just a matter of what that ROI is.  The only real question is to what extent "political" data are measured.

Page 1 / 4   >   >>