Why Sentiment Analysis Doesn't Depend on Text Analytics

Sentiment analysis is hot, but what, precisely, is it? For those seeking solutions that will help better meet customer experience, market research, product quality, and social/media analysis challenges, where (in what technology category) should they look?

My definition: Sentiment analysis is the effort to systematically detect and evaluate opinions, attitudes, and emotions in a spectrum of personal, online, social, and enterprise information sources. Sentiment sources range over the spectrum of content types -- images, audio, video, and text -- and extend to transaction records that can be mined for sentiment-indicating behaviors.

Seen from a broad perspective, sentiment analysis involves content analysis and the sort of number crunching, visual data exploration, and results delivery that is immediately familiar to anyone versed in data mining and business intelligence. Seen from this broad perspective, sentiment analysis draws on, but is not a subset of, text analytics. Text analytics does help you get at sentiment in textual sources, but there are many more sentiment sources out there than just text. Further, as we'll see, you don't even necessarily need text analytics to get at sentiment in text.

If sentiment analysis were a text analytics subset, then a smile, yelling, an angry gesture, and dwell-time on a Web page would all count for nothing. Yet they don't count for nothing; they contain personal and business value. They express mood, attitude, and emotion that are conveyed visually, audibly, and via movement, but they're non-textual and thus can't be parsed directly via text analytics.

Sure, you can transcribe speech to written text and describe an image in words, but you’ll incur loss of context and fidelity and, hence, lower analytical accuracy. Better to pull data from these sources in their native forms. It can be done: A $59 consumer-grade camera can detect a smile. Leading-edge call-center solutions detect emotion by modeling volume, pace, intonation of speech. Images and speech, mined for mood and emotion, are non-textual -- so no, again, sentiment analysis is not a subset of text analytics.

My four examples illustrate non-textual ways humans communicate. The first three, a smile, yelling, and an angry gesture, directly convey mood and emotion. The fourth, dwell-time on a Web page, signals interest and intent, as do purchases and other actions that aggregate to behavior patterns. Survey responses, consumer and business spending trends, commercial inventories: Economists and business forecasters have inferred economic sentiment from these measures for decades. These methods do not involve text analytics.

Finally, crowdsourcing is an important technique that applies human judgment to larger-scale tasks. Want to classify 200,000 photos? No problem! Provider examples include CrowdFlower and Crowd Control Software (layered on Amazon Mechnical Turk). They control the evaluation process, ensuring consistency and quality. Human text/content assessment is not typically classed as “text analytics,” but when those assessments are systematized to assess opinions, attitudes, and emotions, they definitely constitute sentiment analysis, further supporting that sentiment analysis is not a subset of text analytics.

Text analytics is great stuff, but it’s not the be-all and end-all of sentiment analysis!

Point / Counterpoint,

Seth Grimes is an technology strategy consultant, a recognized expert on business intelligence and text analytics. He is a long-time contributor at TechWeb's InformationWeek and a member of Internet Evolution's ThinkerNet.  He is founding chair of the Sentiment Analysis Symposium and the Text Analytics Summit. Seth founded Washington-based Alta Plana Corporation in 1997. He consults, writes, and speaks internationally on information-systems strategy, data management and analysis systems, industry trends, and emerging analytical technologies.

Please visit Seth's on-line business card for more information, and follow Seth on Twitter at @sethgrimes.

Counterpoint: Train for Data Visualization Skills

Chances are you've already got good data visualization experts on staff, even if you don't know it yet.

Customer-Centric Banking Analytics Scares Me

Banks have tons of customer data at their disposal; unfortunately not all will use it scrupulously.

Re: Potential
  • 1/29/2012 9:27:42 PM

Joe, you're right.

Another example: Arguably Net Promoter is a form of sentiment analysis.

Re: Fair enough but...
  • 1/29/2012 8:42:42 PM

I'm partial to D major, myself.  ;)

  • 1/29/2012 8:40:39 PM

Thanks for this important reminder, Seth, that -- although it's the first thing many think about when sentimental analysis comes to mind -- text and linguistic sentiment analysis are not the only kind there are.

Even something as simple as a rating on a scale of one to five stars is sentiment analysis.

And video analytics do indeed hold a great deal of promise for companies like retailers -- as they measure customer reactions to displays, sales techniques, and so on.

There is a huge, untapped pool of potential for sentiment analysis -- limited only by what people can innovate.

Sentiment with rules
  • 1/27/2012 11:34:14 AM

It's an interesting idea should there be "rules" that assure easier interpretation in social media or does this defeat the purpose of free expression?

Re: Fair enough but...
  • 1/26/2012 11:46:13 AM

Beth, Shawn,

A logical category (or logical type) is a set of things on which all the same operations are meaningful (even if the operation is highly improbable or the meaning can't be observed in the real world). 

Eggs, elephants, and electrons are all in the same logical category because they are physical objects we can detect, even if we would find it hard to fry an elephant, impossible to fry an electron, and unlikely to repel an elephant with a charged plate; all those things "mean" something even if the something isn't possible. (Notice that you can picture them mentally even though you know they're impossible.)

Cheese and Wednesday are in different logical categories; you can't picture digesting Wednesday or putting off a meeting till cheese, and "it rained on cheese" means something completely different from "it rained on Wednesday."

So what I'm saying is that tweets, blog posts, emails, notes tied to bricks, billboards, etc. are in one logical category, whereas enthymemes like comparison, reciprocity, dissociation, etc. are in another, and they are as different as cheese and Wednesday.

That's not to say they don't interact (you can deliver cheese on Wednesday), and for example a sustained metaphor like "God the Father" in the New Testament or like Hamlet's constant comparison of his mother to animals would be hard to do in a single tweet (and would mean something different if you did).  There's probably a minimum amount of room for an enthymeme that is developed to one extent or another.  That's one way in which Twitter's constraints make it an easier place to study what people are saying: they can't be saying anything as profound as the Bible or Shakespeare because the enthymemes won't fit.  On the other hand, enthymemes tend to be hologrammatic -- you can miss pieces of them and still get the idea, but the fewer pieces you miss, the more accurate your impression is.  So when you force the tweeters to throw away so many pieces of what they are thinking, you also make the enthymemes fuzzier, vaguer, and harder to discern. 

As for my favorite enthymeme, that's a bit like asking a musician his favorite chord.  All depends where it is and what it's used for.  And I tend to agree with C.S. Peirce that logic is everywhere; there's vast amounts of it in Twitter, and pretty much everywhere, actually.

Re: Fair enough but...
  • 1/26/2012 9:38:33 AM

Well, I was being tongue-and-cheek, but as always I enjoy your answers -- and this one leads me to two new questions. 1) You say, "I think the tweet is in a different category, logically," does that mean to suggest that you feel there IS logic to be applied to the twitterverse? 2) Just out of curious, and because you clearly so love language and its constructs, what's your favorite enthymemes?

Re: Fair enough but...
  • 1/25/2012 11:39:57 PM

Interesting. Can you elaborate?

Re: Fair enough but...
  • 1/25/2012 11:32:29 PM

I think that is a classic "or" question that can be answered with "yes."  I.e. sometimes one, sometimes the other, it depends.

Re: Fair enough but...
  • 1/25/2012 11:30:58 PM

John and Beth,

This discussion is fascinating. From a sentiment analysis point of view, I do wonder whether Twitter's constraints make it better or worse tool for gathering participant moods

Re: Fair enough but...
  • 1/25/2012 10:07:03 PM

I think the tweet is in a different category, logically, Beth; you could express any of the enthymemes in a tweet, at least sketchily. A tweet is more like a haiku, sonnet, or limerick: a few strict rules and a bunch of surrounding customs.  (E.g. limericks are rarely used for funeral odes, and Petrarchian sonnets usually begin with an observation and slide into a comment, but it would still be a limerick or sonnet if you broke the custom.  But four lines of iambic trimeter is not a limerick and ten rhyming couplets is not a sonnet).  Some enthymemes definitely tweet easier than others, but I think that's custom. 

Page 1 / 3   >   >>