Lots of editinga nd cleaning needed for tweets - some of thse can be automated. We spend about 30% of time first cleaning tweets before we can bring into text miner. tehn we often find more isues and go back and forth.
thanks for so many comments - I hope you got enough ideas from the talk. teh booh will give you more detaisla dn refernces.
If you are interetsed in our online program, please see the web site mentioned on teh slide.. We also have a very big fulltime program and if you are interetsed in hiring studenst from my program 9as interns or full-time), do not hesitate to contact me.
Goutam, do you think starting with existing model and supplementing that with text analytics is a good way to go (as in the couple case examples you cited)? Or does starting from scratch work better? Or is that too hard to call?
in thsio case, it was a single person that did the initial annotation and as you expect not all were good. We (me and my grdauate stduents) found several instances of wrong annotationa nd we had to go back to the company to chevk and mosify.
Yypically fora reserach project I like to use at least 2 independent annotation with at least 85% interrate reliability
Honestly, I use SAS almost exclusively (although my studenst sometimes uses R as well) but some of it can be done via other software as well. perhaps you should do a proof of concept project to see what's possible.
Goutam, during the lecture you mentioned that cleaning up the tweets in the fast-food example was quite an effort. Do you by chance know what percentage of project time went into that effort? And, would that be a good rule of thumb to keep in mind if you're going to be working with tweets for text/sentiment analytics?
Maryam - a neutral sentiment is definitely not optimal, from a marketing perspective. But an algorithm may flag tweets/posts as neutral only because they don't contain distinctly positive or negative terms
statitician yes the experieince is part of the brand and eventually if the expereinces are nuetral overall the customer simply not buy from that brand because there is nothing unique or special about them. Another brand can lure them away with a better expereince.
How do your tools compensate for typos, mixed languages (Spanglish) and text shorthand? A lot of data might not be caught, do you think there is probably better value in having analysts read all posts?
@ Maryam, I think it does show it hurt profits, because they are still having a problem two years later. After the start of the scandle, the good comments went away and never really came back. The damage seems to be permanent.
Chic-fil-A is still having problems and those comments were made in 2012. It shows how social media can keep scandel and boycotts alive, rather than be quickly forgotten. Once it's on the net, it is forever.
I'm not so sure the fast food industry will see customers providing a lot of unstructured data voluntarily that might be valid or useable comments and it would be good to hear more about those possibilities and how the industry will deal with that.
The fast food industry seems to be a bit more difficult to implement the data without causing some privacy issues or public relations issues among customers, but it will be interesting to see how the data might well predict some outcomes.
Scoring customer comments into negative and positive attributes seems easy enough but how to correlate when there's more negative over positive comments in certain circumstances that might not predict sentiments of all customers accurately.
Michael, I don't generally think there's some insidious plot to preach anonymized data but not really do it completely, but I think there's real concern among lots of people that feel if somebody wants to work hard enough at it, a person can be linked to his or her anonymized data.
@kq4ym - I think that depends on the question you're trying to answer, right? Ask the question, then determine what data could help you answer it. Think in that context rather than internal/external, structure/unstructured.
I wonder what the differences may be in determining which to use: external, internal data, and how to choose among the structure and unstructured for the best in reliable predictability vs. cost of programming time and effort
With the meteoric rise in customer text data availabe, I would think keeping current with the ability to store, maintain and manipulate this data may be a challenge for smaller companies without the experience needed to use the predictive analytics
The Oklahome State online graduate certificate in marketing analytics for either professional with a technical or even a non-technical background looks like a pretty cool idea with only a 12 hour minimum hour class load required.
Customer intelligence (CI) is the process of gathering and analyzing information regarding customers; their details and their activities, in order to build deeper and more effective customer relationships and improve strategic decision making. (Wiki)
Hi everybody. As a reminder, the audio player will appear in the window above at start time. If it does not play automatically, please click the left arrow. If streaming stops, you might have to re-click -- or try another browser.
Get up to speed with emerging analytics technologies including Natural Language Processing, Edge Analytics, Machine Learning, Real-time Analytics, and Augmented Analytics. These expert-led sessions are for analytics leaders, professionals and business users.
Get started with tomorrow's analytics technology. Sign up today.