All Analytics Academy
The Internet of Things Joins the Enterprise

Jun 9 - Jun 23
Join 5 interactive classes & chat with peers
 
Meta S. Brown

Mo' Data Blues

NO RATINGS
View Comments: Newest First | Oldest First | Threaded View
Page 1 / 2   >   >>
Noreen Seebacher
User Rank
Blogger
Re: What is the goal?
Noreen Seebacher   9/6/2013 1:29:22 AM
NO RATINGS
Cats? Someone say cats? Try these ones: http://www.buzzfeed.com/chelseamarshall/cats-who-have-given-up-on-everything?s=mobile

SethBreedlove
User Rank
Data Doctor
Re: What is the goal?
SethBreedlove   9/5/2013 5:04:39 PM
NO RATINGS
@ SaneIT,  I'm sure it is mostly cat videos.  I have a friend who's cat has his own Facebook page.  And that is not uncommon.

So I guess the question, what is the source of most of this data.  Is it financial institutions or Yelp?  What is really relevant?  Are all these cat videos driving up storage costs.  (And yes, I watch cat videos.)

 

SaneIT
User Rank
Data Doctor
Re: What is the goal?
SaneIT   9/5/2013 7:17:44 AM
NO RATINGS
"5247GB of data for every person in the world."


That's a sobering thought  especially when you realize someone has to manage that data.  As the data sets grow it's just going to be more important that the quality of data rises otherwise you'll never be able to make sense of what data you do have.

SethBreedlove
User Rank
Data Doctor
Re: What is the goal?
SethBreedlove   9/4/2013 8:22:46 PM
NO RATINGS
I feel that many people feel "Why have 95% certainty, when I can have a 100%?".  Of course, especially with sentiment analysis, even with all of the data, interpretation will never be 100%. 

There have been times, when I've had a small number of subjects to analyze and including everyone wasn't a problem, but when data sets becoming massive, one has to pick and choose. 

@ SaneIT. Data hoarding is a good way to describe it.  IDC projects that by 2020, there will be approximately 5247GB of data for every person in the world.  Now hoard that!

SaneIT
User Rank
Data Doctor
Re: What is the goal?
SaneIT   9/4/2013 7:28:48 AM
NO RATINGS
I agree that knowing what you're looking for is more important than how much data you have.  I see the difference in a big database with good people pulling out good data as a smooth running warehouse.  They know where each part is and they know how to get to it in the most efficient way plus they know how those pieces go together so they know right away if something looks off.  I see the big database without thought like an episode of the show Hoarders.  There are just big piles of stuff everywhere and even though they aren't quite sure what they have or where they have it they can't get rid of anything because they might want it later.

Meta S. Brown
User Rank
Blogger
Re: Sentiment Analysis
Meta S. Brown   9/3/2013 8:01:27 PM
NO RATINGS
If the goal is to summarize the sentiment expressed across a large group of messages, then I agree that using a sample and manually assessing sentiment in each is often a good way to go. There are limitations to that approach, though.

If you have many such groups of messages to evaluate, then time is a limiting factor. Sentiment categorization often uses a complex set of rules, and many organizations have had bad experiences using junior staff or outside vendors to perform categorization, due to lack of experience and high turnover of these analysts. To be fair, lack of training is also an issue. Not every organization is willing to invest in training analysts, either. So, for speed and consistency, it may be more effective to use automated techniques.

PredictableChaos
User Rank
Data Doctor
Sentiment Analysis
PredictableChaos   9/3/2013 7:11:06 PM
NO RATINGS
 

In sentiment analysis, I think  you've chosen an especially good example of the weakness of large data sets.  Sometimes, maybe even often, reading the full Tweet leads a human to a different sentiment conclusion than the software finds by just parsing words.

Maybe it would be better to randomly select a couple hundred Tweets and just read them all?  Or select a thousand and have a (paid) intern read them all?

PC

 

BethSchultz
User Rank
Blogger
Re: When big is small, or vice versa
BethSchultz   9/3/2013 10:32:23 AM
NO RATINGS
Gil always has interesting insight to share. Thanks!

 

Meta S. Brown
User Rank
Blogger
Re: When big is small, or vice versa
Meta S. Brown   9/3/2013 9:58:20 AM
NO RATINGS
Speaking of buzzwords, readers might enjoy this Forbes post by Gil Press:

Data Science: What's The Half-Life Of A Buzzword? - Forbes

http://bit.ly/metaq004

Gil discusses the proliferation of new university programs in data science, and presents differing views about them.

BethSchultz
User Rank
Blogger
Re: When big is small, or vice versa
BethSchultz   9/3/2013 9:49:08 AM
NO RATINGS
So maybe soon we'll be back to calling it just plain old "data"...  until some new buzzword strikes everybody's fancy.

 

Page 1 / 2   >   >>
Information Resources
More Blogs from Meta S. Brown
Data alone won't make an analyst's work memorable or actionable in the eyes of a business executive. A story puts it into perspective.
When it comes to acquiring the data that will feed your analytics initiative, "free" isn't always the best approach.
Data presents an opportunity for enterprises to utilize personalization. Do it right and you can win. Do it wrong and you turn off those customers.
Pundits and analysts feel the need to expand the original designation of volume, velocity, and variety.
Expert advice on finding the right combination of benefits and savings that will convince management to fund your text analytics investment.
Radio Show
Radio Shows
UPCOMING
James M. Connolly
Finding Answers Through Prescriptive Analytics


7/21/2015   REGISTER   0
UPCOMING
James M. Connolly
Health Analytics: Find Data Beyond the Hospital Doors


7/28/2015   REGISTER   0
ARCHIVE
James M. Connolly
Visualization: How to Bring Data to Life


6/22/2015  LISTEN   55
ARCHIVE
James M. Connolly
Learn Why Analytics Are at Home in the Cloud


6/15/2015  LISTEN   26
ARCHIVE
James M. Connolly
Analytics: Your Defense Against Cyber Threats


5/27/2015  LISTEN   60
ARCHIVE
James M. Connolly
Big Data & Big Pharma: How Analytics Might Save Your Life


5/19/2015  LISTEN   37
ARCHIVE
James M. Connolly
Live Interviews From SAS Global Forum


4/28/2015  LISTEN   11
ARCHIVE
James M. Connolly
How to Hire Great Analytics Talent


4/23/2015  LISTEN   51
ARCHIVE
James M. Connolly
Sports Analytics Mean Fun and Business


3/24/2015  LISTEN   3
ARCHIVE
James M. Connolly
Secure Your Big Data in the Cloud


2/26/2015  LISTEN   114
ARCHIVE
James M. Connolly
Make It Big As a Data Scientist in 2015


2/11/2015  LISTEN   106
ARCHIVE
James M. Connolly
Big Data, Decisions & the Simulated Experience


2/3/2015  LISTEN   87
Information Resources
Quick Poll
Quick Poll
Infographic
Infographic
It Pays to Keep Insurance Fraud in Check
While 97% of insurers say that insurance fraud has increased or remained the same in the past two years, most of those companies report benefits from anti-fraud technology in limiting the impact of fraud, including higher quality referrals, the ability to uncover organized fraud, and improve efficiency for investigators.
Follow us on Twitter
Follow us on Twitter
Like us on Facebook
Like us on Facebook
About Us  |  Contact Us  |  Help  |  Register  |  Twitter  |  Facebook  |  RSS