Marshall Sponder

What to Do With Unstructured Data

NO RATINGS
View Comments: Newest First | Oldest First | Threaded View
Page 1 / 3   >   >>
Harinath Vicky
User Rank
Prospector
Re: cleaning with integrity
Harinath Vicky   10/30/2014 1:10:11 PM
NO RATINGS
Data searched from several different kinds of unstructured sources (mostly text based at the moment) can be restructured (flattened) quickly and inexpensively via this tool within  IRI NextForm.

ruehmkorf
User Rank
Prospector
Re: Graphic
ruehmkorf   6/1/2014 10:28:55 AM
NO RATINGS
Hi BritInBigD, 

as far as I can tell that is taken from a paper from The Data Warehousing Institute. See the corresponding article for more: BI Search and Text Analytics

BritInBigD
User Rank
Prospector
Graphic
BritInBigD   1/21/2013 8:45:31 AM
NO RATINGS
Hi Marshall,

I was curious to know the source of the graphic that accompanies your post (and specifically the indicative growth rates shown for the different data categories). Is that data from IDC? 

Maryam@Impact
User Rank
Blogger
Re: cleaning with integrity
Maryam@Impact   12/31/2012 6:56:50 PM
NO RATINGS
@Hospice I agree, but I don't know of any standard protocools because their is so much varaibaility with unstructured data depending on industry etc. Still a challenge to get standards.

kicheko
User Rank
Blogger
Re: Yikes - some misconceptions here that need to be addressed
kicheko   12/31/2012 8:50:05 AM
NO RATINGS
webmetricsguru, - One big challenge i see is analytics of sentiment oriented data even though the area has grown a great deal in the course of this year -- new apps and all. This kind of data at least for now may still need to be reviewed closer because its difficult to automate it in one given pattern without locking out new sentiments that are outside of that original set. However as intelligent systems learn this data, it will cut down what we have to look at even in sentiment analytics.

webmetricsguru
User Rank
Blogger
Re: Yikes - some misconceptions here that need to be addressed
webmetricsguru   12/30/2012 11:35:11 PM
NO RATINGS
If I understand you correctly, that's the bane of the PR / Marcom industry - that you can actually look at bunch of verbatim (maybe that's ok for 10-30) but what happens when you have thousands or more?

I think a discussion on just what cleaning data is and how to to best do it would be good for AllAnalytics.com personally.  I'd like to see what we come up with, and I bet a lot of others would too.

Hospice_Houngbo
User Rank
Prospector
Re: Yikes - some misconceptions here that need to be addressed
Hospice_Houngbo   12/30/2012 11:21:17 PM
NO RATINGS
"If we can find the essential information or pattern, we might not need to look at most of it"

I see. I suppose that those hand-written patterns can just be domain specific and will be difficult to generalize. I agree that extracting the most useful patterns might be enough in most cases, as it difficult to think of all possible patterns. One of drawback of such model is that human patterns are often low-recall, even if precision is high. 

Hospice_Houngbo
User Rank
Prospector
Re: Step 1
Hospice_Houngbo   12/30/2012 11:05:50 PM
NO RATINGS
It is true that some of the points you mentioned are debatable - like "Distribute the data in the cloud". But they are valid points to take into account when dealing with unstructured data. To the question how to clean unstructured data? I think that it depends on the shape and the model that has been defined.

webmetricsguru
User Rank
Blogger
Re: Yikes - some misconceptions here that need to be addressed
webmetricsguru   12/30/2012 11:05:17 PM
NO RATINGS
What I meant is that currently, people usually end up needing to look at the data to understand it (because it is un structured information) and attempts to use software to understand it, in my opinion, won't work, at least not today. What you can do, I think, and maybe our friends here can confirm or argue this, is cut down on what we have to look at. If we can find the essential information or pattern, we might not need to look at most of it - and hopefully the software created can help surface that information, and maybe that's the best we can hope for (big data hype or not). At any rate, this is an interesting discussion and I don't have all the answers - but I am wondering just what they are.

Hospice_Houngbo
User Rank
Prospector
Re: Yikes - some misconceptions here that need to be addressed
Hospice_Houngbo   12/30/2012 10:52:47 PM
NO RATINGS
@marshall,

"I define it as something a human needs to look at to fully process"

I still don't get it. Do you mean that there is the need for human intervention to figure out whether the data is unstructured or not? Won't that be time consuming and practically impossible for human to go through all the instances of the data due to it size? Maybe it is not what you mean?

Page 1 / 3   >   >>
Information Resources
More Blogs from Marshall Sponder
When the data we don't know is as important as the data we do, our analytics platform are all but guaranteed to fail us.
Segmentation, multichannel integration, and intelligent dashboard reporting are vital capabilities, yet many business analytics solutions fall short.
Social media is playing an important role in politics, but determining a victor based on what's happening out there isn't so easy.
Experts gathered at a conference to share the latest in this niche analytics technology.
Radio Show
Radio Shows
UPCOMING
James M. Connolly
Live Interviews From SAS Global Forum


4/28/2015   REGISTER   0
ARCHIVE
James M. Connolly
How to Hire Great Analytics Talent


4/23/2015  LISTEN   51
ARCHIVE
James M. Connolly
Sports Analytics Mean Fun and Business


3/24/2015  LISTEN   4
ARCHIVE
James M. Connolly
Secure Your Big Data in the Cloud


2/26/2015  LISTEN   114
ARCHIVE
James M. Connolly
Make It Big As a Data Scientist in 2015


2/11/2015  LISTEN   106
ARCHIVE
James M. Connolly
Big Data, Decisions & the Simulated Experience


2/3/2015  LISTEN   87
ARCHIVE
James M. Connolly
A Chat About Big Data, Machine Learning & Value


1/15/2015  LISTEN   125
ARCHIVE
Curtis Franklin Jr.
An Infrastructure for Analytics


12/18/2014  LISTEN   63
ARCHIVE
James M. Connolly
Prepare for the Internet of Things Data Blitz


12/16/2014  LISTEN   50
ARCHIVE
James M. Connolly
How Mature Is Your Analytics Program?


11/18/2014  LISTEN   148
ARCHIVE
James M. Connolly
Drive Big Decisions Using Data & Analytics


11/10/2014  LISTEN   73
Information Resources
Infographic
Infographic
It Pays to Keep Insurance Fraud in Check
While 97% of insurers say that insurance fraud has increased or remained the same in the past two years, most of those companies report benefits from anti-fraud technology in limiting the impact of fraud, including higher quality referrals, the ability to uncover organized fraud, and improve efficiency for investigators.
Follow us on Twitter
Follow us on Twitter
Quick Poll
Quick Poll
Like us on Facebook
Like us on Facebook
About Us  |  Contact Us  |  Help  |  Register  |  Twitter  |  Facebook  |  RSS