Top 5 Challenges of Text Analytics

  • 10/16/2012
  • by
  • 19
  • 873

Text analytics, my research shows, has become a "have to have" technology for a majority of companies that use it.

So I've learned from the many companies I've talked to as I prepare Hurwitz & Associates' Victory Index for Text Analytics, a tool that assesses not just the technical capability of the technology but its ability to provide tangible value to the business (look for the results of the Victory Index in about a month). However, they also said challenges abound -- and those don't necessarily involve the text analytics software itself.

Here’s a quick look at five challenges users said they most often run into with text analytics.

  1. Data access. Often, companies will want to utilize more than one source of unstructured data for analysis, but gaining access to this data can be challenging. This is more than getting a hold of the Twitter fire hose for customer intelligence analysis. This is about the right to use internal or cross-company data stores like institutional document repositories in the face of corporate politics or delays due to operational procedures, like making formal requests for the data from IT.

  2. Managing expectations. In some organizations, text analytics can leave management with the idea that you can simply plug in the software, feed it text data, and have it automatically give you the answers. While you may be able to get some high-level answers this way using tools tuned for social media, the reality is that most of the time you’ll have to interact with the software, especially when it comes to building a taxonomy (see No. 5). Text analytics tends to be more semi-automatic than automatic.

  3. Trusting the data. On the flip side of managing inflated expectations is the need to establish trust in the data. This challenge can manifest itself in terms of data quality and as a cultural issue.

    Determining data quality for unstructured data is hard for many reasons including the fact that words have multiple meanings and unstructured text can be noisy with typos, colloquialisms, and so on. Often times, with text data you’re going to get about 70 percent to 80 percent accuracy. That can be a challenge for some people.

    Using text analytics in decision-making also requires a cultural change, which can be difficult. For example, in organizations that are used to classifying content manually, moving to a semi-automated approach can be a big shift and people might not believe the classification schemes. They'll be skeptical -- sometimes because of the way the analysis is presented. For instance, structured data might indicate that people are buying a wireless company's phones. Because sales are up, executives might not believe that the unstructured data in call center notes or on the Web shows negative sentiment about the phones -- that they're buying them only because their choices are limited. You need to be able to tell the story and make people understand the kind of analysis you can do with this new source of data. This can take time.

  4. Building the skills. The skills you’re going to need to analyze text will vary depending on the problem you’re trying to solve. Some people claim that you need to understand your industry. Others say being analytical is enough. If the goal is using a social media analytics tool to do some high-level analysis on brand reputation, you'd likely need only a small amount of training. But if you’re trying to combine structured and unstructured data to increase the lift of a predictive model, then you'll need deeper skills development. Regardless of the issue you’re looking to address, text analytics involves dealing with a new form of data and there is going to be a learning curve involved in knowing what to do and how to apply it to the business. You’ll also have to know how to ask the right kind of questions. This is a learning process.

  5. Taxonomy issues. A taxonomy is a method for organizing information, or sometimes categories, into hierarchical relationships. Because a taxonomy defines the relationships between the terms a company uses, it makes it easier to find and then analyze text. Some organizations hire people skilled in taxonomy development to build it. Some vendors provide out-of-the-box taxonomies for certain industries. Even so, you’re going to have to deal with the vagrancies of the terminology in your industry, and there is going to be upfront work to specify this terminology. Many end-users feel that the necessary taxonomy development, or refining their categories (if that is the way you’re ultimately building a taxonomy), is difficult. It can take more than one try. Companies need to plan for this.

So remember, new ways of doing things generally involve challenges, and text analytics is no exception. Overcoming these challenges will require time, training, and persistence.

What are your biggest challenges (or what do you imagine them to be) regarding a text analytics implementation? Share below.

Fern Halper,

Fern Halper is the director of TDWI research for advanced analytics, focusing on predictive analytics, social media analysis, text analytics, cloud computing, and other big-data analytics approaches. She has more than 20 years of experience in data and business analysis and has published numerous articles on data mining and information technology. Halper is co-author of "Dummies" books on cloud computing, hybrid cloud, service-oriented architecture, and service management, as well as Big Data for Dummies. She has been a partner at industry analyst firm Hurwitz & Associates and a lead analyst for Bell Labs. Her PhD is from Texas A&M University. You may reach her at fhalper@tdwi.

Five Reasons to Use Text Analytics

Text analytics software is going mainstream, and here are a handful of reasons why.

The Top of the Top Five
  • 10/19/2012 8:36:52 PM

Great post! A timely topic, becoming even timlier.  In my view, the taxonomy is the biggest of the five you've listed.  There are so many variations on ways to create a taxonomy, because the data are unstructured, so there's little embedded guidance as to how to structure your taxonomy.  This, I believe, is why organizations absolutely must commit to documentation, standards, and repeatable measures in their text analytics activities.  Otherwise it's going to be very difficult to meet the other challenges (such as believing/trusting the data and results).

Re: Great Post
  • 10/19/2012 7:39:15 PM

Interesting. Text Analytics software used for ROI should be deployed and compete with other methods to solve problems. Organizations can select Text Analytics if it can produce a better result or better problem solve versus other analytics methods

Re: Great Post
  • 10/19/2012 8:19:25 AM

Yes, it's true that turning the soft benefit of text analytics into an ROI is difficult, as Beth mentioned. Or that today, most text analytics projects do not have ROI as Fern says, but in at least 2 of the 5 reasons Fern lists in her post (customer service routing or deflection and lead generation), it is pretty common to use ROI models to evaluate the investment in text analytics. Obviously, this is more difficult in voice of the customer or customer experience optimization projects. This is unfortunate because these are probably the initiatives that bring the highest ROI to organizations, as usually they contribute to both revenue growth and cost reduction. However, I think this is more because predictive models accounting for "soft" variables are still in development. I think the analogy with weather forecasting that I wrote about here is valid. When these new models are developed, it will be a great day for text analytics because adoption will grow significantly. We'll see if I'm right :). In any case, thanks to both of you for the high quality work you      are doing on this site.

Re: Great Post
  • 10/18/2012 3:17:54 PM

Thanks @lscaqlarini for your kind words about the post.  You bring up a good point about ROI.  Interestingly, a large number  of the companies I speak to don't have to do an ROI analysis for their text analytics software.

Re: Great Post
  • 10/18/2012 10:51:01 AM

@lscagliarini, first off, thanks for jumping onto the message boards. Welcome, and I'm glad you found value in Fern's post! You raise a great point, as well as a potential difficulty, I'd say. The soft benefits associated with being able to analyze text quickly and efficiently are fantastic. But wouldn't you say they're awfully difficult to work into a formal ROI statement? 

Great Post
  • 10/18/2012 9:07:59 AM

Perfect analysis. I would add, from my experience, that ROI calculation for text analytics projects, especially the most strategic ones, often requires accounting for the cost of ignorance (i.e. loss of potential revenue or cost savings) that cannot be easily calculated using traditional data based financial models. What are the costs of not interjecting a complaint about your product in time or finding out late (and too late) about an existing patent in an area related to your R&D project, or not being able to immediately identify an employee with the right skill set to address an urgent organizational issue?  This is not so different from what was happening to supply chains 15-20 years ago before the right models to deal with their complexity were developed.  In any case, this is a great post, one that I'm sure to refer to again.  

Re: Sarcasm
  • 10/16/2012 3:55:36 PM

I guess that would depend on a)what tool was used and how finely tuned the sentiment was and  b)who the analyst was that was doing the analysis and whether they tuned the sentiment.  I would be suspicious of anyone making the claim that mentions somehow equates to who won a debate in any event - I'd have to see the other part of the analysis!  Goes to show you how someone can use an analysis and say what they want from it.........

Re: Sarcasm
  • 10/16/2012 2:36:06 PM


The morning following the first presidential debate, a twitter analysis showed that Romney had significantly more mentions than Obama -  and this was cited as part of the evidence that Romney won.

Given the widespread use of sarcasm and irony around the topic of presidential politics, is this data even 70% reliable?


BTW - The second presidential debate is tonight.

Re: Access Control
  • 10/16/2012 2:10:41 PM

Alexis, right -- lots of front end work required to make sure you're working with quality data. In a way, that's no different with text analytics than any other sort of analytics, at a basic level, at least.

Re: Access Control
  • 10/16/2012 2:08:52 PM

@Noreen, for some reason I'm having a hard time reconciling the idea that a grocer than lets a store become a "nasty, dirty place" would be bothering with text analytics. That seems a disconnect to me -- why invest in measuring measure customer sentiment using advanced analytics tools if you can't even bother to pick up a broom and a mop?

Page 1 / 2   >   >>